Course

Skills

Structured Streaming in Apache Spark 2

by Janani Ravi

Many sources of data in the real world are available in the form of streams; from self-driving car sensors to weather monitors. Apache Spark 2 is a powerful, distributed, analytics engine which offers great support for streaming applications

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(57)

Level

Beginner

Updated

Jun 22, 2018

Duration

2h 11m

What you'll learn

Stream processing applications work with continuously updated data and react to changes in real-time. Data frames in Spark 2.x support infinite data, thus effectively unifying batch and streaming applications. In this course, Structured Streaming in Apache Spark 2, you'll focus on using the tabular data frame API to work with streaming, unbounded datasets using the same APIs that work with bounded batch data. First, you'll start off by understanding how structured streaming works and what makes it different and more powerful than traditional streaming applications; the basic streaming architecture and the improvements included in structured streaming allowing it to react to data in real-time. Then you'll create triggers to evaluate streaming results and output modes to write results out to file or screen. Next, you'll discover how you can build streaming pipelines using Spark by studying event time aggregations, grouping and windowing functions, and how to perform join operations between batch and streaming data. You'll even work with real Twitter streams and perform analysis on trending hashtags on Twitter. Finally, you'll then see how Spark stream processing integrates with the Kafka distributed publisher-subscriber system by ingesting Twitter data from a Kafka producer and process it using Spark Streaming. By the end of this course, you'll be comfortable performing analysis of stream data using Spark's distributed analytics engine and its high-level structured streaming API.

Course Overview

2mins

Course Overview 2m

Understanding the High Level Streaming API in Spark 2.x

47mins

Building Advanced Streaming Pipelines Using Structured Streaming

59mins

Module Overview 1m
Demo: Append Mode 7m
Demo: Complete Mode 3m
Demo: Aggregations on Streaming Data 4m
Demo: SQL Queries on Streaming Data 2m
Demo: Using a UDF to Mimic Event Time 3m
Demo: Grouping on Timestamp and Explicit Triggers 2m
Stateful Window Operations 3m
Tumbling and Sliding Windows 4m
Event, Ingestion, and Processing Time 2m
Demo: Window Operations 3m
Watermarks and Late Data 5m
Demo: Twitter Keys and Access Tokens 3m
Demo: Using Tweepy to Connect to Twitter Streaming 4m
Demo: Count Hashtags in Twitter Streaming Data 4m
Demo: Count Hashtags in a Twitter Stream Using Windows 3m
Demo: Joining Batch and Streaming Data 3m
Demo: Joins to Calculate Average Spend by Gender 2m
Demo: Aggregating Ratings by Age 2m
Demo: Windowed Joins 3m

Integrating Apache Kafka with Structured Streaming

22mins

Module Overview 1m
Introducing Apache Kafka 4m
Demo: Kafka Producers and Consumers 3m
Demo: Kafka Tweet Hashtag Producer 5m
Demo: Integrating Spark with Kafka 5m
Demo: Counting Positive, Negative, and Neutral Tweets 3m
Summary and Further Study 2m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(57)

Level

Beginner

Updated

Jun 22, 2018

Duration

2h 11m

Ready to upskill? Get started

Contact Sales

Structured Streaming in Apache Spark 2

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Structured Streaming in Apache Spark 2

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?