Exploring the Apache Spark Structured Streaming API for Processing Streaming Data

Structured streaming is the scalable and fault-tolerant stream processing engine in Apache Spark 2. Data frames in Spark 2.x support infinite data, thus effectively unifying batch and streaming applications.
Course info
Level
Beginner
Updated
Sep 25, 2020
Duration
2h 48m
Table of contents
Course Overview
Exploring Sources and Sinks
Processing Streaming Data Frames
Performing Windowing Operations on Streams
Working with Streaming Joins
Managing and Monitoring Streaming Queries
Description
Course info
Level
Beginner
Updated
Sep 25, 2020
Duration
2h 48m
Description

Stream processing applications work with continuously updated data and react to changes in real-time. In this course, Exploring the Apache Spark Structured Streaming API for Processing Streaming Data, you'll focus on using the tabular data frame API as well as Spark SQL to work with streaming, unbounded datasets using the same APIs that work with bounded batch data.

First, you’ll explore Spark’s support for different data sources and data sinks, understand the use case for each and also understand the fault-tolerance semantics that they offer. You’ll write data out to the console and file sinks, and customize your write logic with the foreach and foreachBatch sinks.

Next, you'll see how you can transform streaming data using operations such as selections, projections, grouping, and aggregations using both the DataFrame API as well as Spark SQL. You'll also learn how to perform windowing operations on streams using tumbling and sliding windows. You'll then explore relational join operations between streaming and batch sources and also learn the limitations on streaming joins in Spark.

Finally, you'll explore Spark’s support for managing and monitoring streaming queries using the Spark Web UI and the Spark History server.

When you're finished with this course, you'll have the skills and knowledge to work with different sources and sinks for your streaming data, apply a range of processing operations on input streams, and perform windowing and join operations on streams.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
[Autogenerated] Hi, My name is Jenny Ravi, and welcome to this course on exploring the Apache Spark structured streaming A P I for processing streaming data a little about myself. I have a masters in electrical engineering from Stanford on have worked at companies such as Microsoft, Google and Flip Card. I currently work on my own startup Loony Con, a studio for high quality video content. Stream processing applications work with continuously updated data on react to changes in real time. In this course, you will focus on using the tabular data from a P I as fellas Park sequel toe work with streaming unbounded data sets. First, you'll explore spark support for different data sources and data. Things understand the use case for each on also understand the fault tolerant semantics that they offer. You'll write data out to the console and file things on customize You're right logic with the four each and for each batch things. Next you'll see how you can transform streaming data using operations such as selections, projections, grouping and aggregations. Using both the data frame a P I as well a spark sequel. You will also learn how to perform dwindling operations on streams using tumbling and sliding windows. You'll then explore relational joint operations between streaming and bad sources and also learn the limitations on streaming joints in spark. Finally, you'll explore spark support for managing and monitoring streaming queries using the Spark web, You, I and the Spark History Server. When you're finished with this course, you'll have the skills and knowledge toe work with different sources and sinks for your streaming data. Apply a range of processing operations on input streams on, perform window ing and join operations on streams IT.