- Course
- Data
Stream Processing Frameworks: Apache Spark Structured Streaming
Learn how to process real-time data using Apache Spark Structured Streaming. This course teaches you how Spark handles streams with micro-batches, applies triggers and watermarks, and integrates into modern data architectures.
What you'll learn
Streaming data is everywhere, from transactions to clickstreams, but handling it at scale requires specialized frameworks.
In this course, Stream Processing Frameworks: Apache Spark Structured Streaming, you’ll gain the ability to build reliable, scalable streaming pipelines.
First, you’ll explore the micro-batch model of Spark Structured Streaming, including triggers, watermarks, and latency/throughput characteristics.
Next, you’ll discover how Spark enables real-time workflows, with unbounded DataFrames, incremental execution, checkpoint recovery, streaming operations, and integration with sources like Kafka and sinks like Delta Lake.
Finally, you’ll learn to evaluate Spark’s suitability for different workloads, compare it to other frameworks, and understand its role in modern data architectures.
When you’re finished with this course, you’ll have the skills and knowledge of Apache Spark Structured Streaming needed to design, implement, and evaluate streaming solutions.
Table of contents
About the author
Tejprakash is a software engineer with a background in backend development. He is currently focused on IoT security and data engineering, building secure, scalable, high-performance systems.