Expanded

Modeling Streaming Data for Processing with Apache Beam

The Apache Beam unified model allows us to process batch as well as streaming data using the same API. Several execution backends such as Google Cloud Dataflow, Apache Spark, and Apache Flink are compatible with Beam.
Course info
Level
Beginner
Updated
Sep 18, 2020
Duration
2h 27m
Table of contents
Description
Course info
Level
Beginner
Updated
Sep 18, 2020
Duration
2h 27m
Your 10-day individual free trial includes:

Expanded library

This course and over 7,000+ additional courses from our full course library.

Hands-on library

Practice and apply knowledge faster in real-world scenarios with projects and interactive courses.
*Available on Premium only
Description

Streaming data usually needs to be processed real-time or near real-time which means stream processing systems need to have capabilities that allow them to process data with low latency, high performance and fault-tolerance. In this course, Modeling Streaming Data for Processing with Apache Beam, you will gain the ability to work with streams and use the Beam unified model to build data parallel pipelines. First, you will explore the similarities and differences between batch processing and stream processing. Next, you will discover the Apache Beam APIs which allow one to define pipelines that process batch as well as streaming data. Finally, you will learn how windowing operations can be applied to streaming data. When you are finished with this course, you will have a strong grasp of the models and architectures used with streaming data and be able to work with the Beam unified model to define and run transformations on input streams.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Summarizing Data and Deducing Probabilities
Intermediate
2h 50m
Jul 8, 2021
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi. My name is Janani Ravi, and welcome to this course on Modeling Streaming Data for Processing with Apache Beam. A little about myself. I have a master's in electrical engineering from Stanford and have worked at companies such as Microsoft, Google, and Flipkart. I currently work on my own startup, Loonycorn, a studio for high‑quality video content. Streaming data usually needs to be processed real time or near real time, which means stream processing systems need to have capabilities that allow them to process data with low latency, high performance, and fault tolerance. In this course, you will understand the nuances and challenges of working with streams and use the Beam unified model to build data‑parallel pipelines. You'll start this course off by understanding the similarities and differences between batch processing and stream processing. We'll discuss the processing models and architectures that stream processing systems use and see the tradeoffs involved in the range of choices available. Next, you'll get started with the Apache Beam APIs, which allow us to define pipelines that process batch, as well as streaming data. You'll understand the basic components of a Beam pipeline, PCollections and PTransforms, and define and execute simple pipeline operations using the Beam Direct Runner. Finally, you will see how windowing operations can be applied to streaming data. You will study the different types of windows that Beam supports, that is fixed windows, sliding windows, session windows, and global windows, and you will see how these windows can be applied to input streams. When you're finished with this course, you will have a strong grasp of the models and architectures used with streaming data and you will be able to work with the Beam unified model to define and run transformations on input streams.