Course

Skills

Processing Streaming Data Using Apache Spark Structured Streaming

Structured streaming is the scalable and fault-tolerant stream processing engine in Apache Spark 2 which can be used to process high-velocity streams.

Preview this course

What you'll learn

Stream processing applications work with continuously updated data and react to changes in real-time. In this course, Processing Streaming Data Using Apache Spark Structured Streaming, you'll focus on integrating your streaming application with the Apache Kafka reliable messaging service to work with real-world data such as Twitter streams.

First, you’ll explore Spark’s architecture to support distributed processing at scale. Next, you will install and work with the Apache Kafka reliable messaging service.

Finally, you'll perform a number of transformation operations on Twitter streams, including windowing and join operations.

When you're finished with this course you will have the skills and knowledge to work with high volume and velocity data using Spark and integrate with Apache Kafka to process streaming data.

Course Overview

2mins

Course Overview 2m

Getting Started with the Spark Standalone Cluster

54mins

Integrating Spark with Apache Kafka

53mins

Stream-first Architecture 2m
Introducing Apache Kafka 7m
Demo: Installing and Setting up Apache Kafka 3m
Demo: Publishers, Consumers, and Topics 4m
Demo: Creating a Developer Account on Twitter 5m
Demo: Connecting to Twitter Using Tweepy 6m
Demo: Extracting and Counting Hashtags from a Twitter Stream 7m
Demo: Reading Messages from Multiple Publishers 2m
Demo: Reading from Multiple Topics 3m
Demo: Reading from Multiple Topics Using a Regular Expression 4m
Demo: Performing Sentiment Analysis on Input Tweets 3m
Demo: Assigning Sentiment Status to Tweets 2m
Demo: Writing to a Kafka Sink and Foreach Sink 6m

Performing Windowing Operations on Streams

21mins

Brief Overview of Windowing and Time 5m
Demo: Extracting Event Time and Associating Processing Time 6m
Demo: Computing Aggregations within a Global Window 4m
Demo: Applying Tumbling Windows and Sliding Windows 6m

Performing Join Operations on Streams

24mins

Streaming Joins 4m
Demo: Performing Static Streaming Joins 6m
Demo: Writing Join Results to a Kafka Topic 2m
Demo: Writing Join Results to a Parquet File 2m
Demo: Unit Testing UDFs 5m
Demo: Manual End-to-end Testing of a Spark Application 3m
Summary and Further Study 1m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Ready to upskill? Get started

Contact Sales

Processing Streaming Data Using Apache Spark Structured Streaming

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Processing Streaming Data Using Apache Spark Structured Streaming

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?