Modeling Streaming Data for Processing with Apache Beam

The Apache Beam unified model allows us to process batch as well as streaming data using the same API. Several execution backends such as Google Cloud Dataflow, Apache Spark, and Apache Flink are compatible with Beam.
Course info
Level
Beginner
Updated
Sep 18, 2020
Duration
2h 27m
Table of contents
Description
Course info
Level
Beginner
Updated
Sep 18, 2020
Duration
2h 27m
Description

Streaming data usually needs to be processed real-time or near real-time which means stream processing systems need to have capabilities that allow them to process data with low latency, high performance and fault-tolerance. In this course, Modeling Streaming Data for Processing with Apache Beam, you will gain the ability to work with streams and use the Beam unified model to build data parallel pipelines. First, you will explore the similarities and differences between batch processing and stream processing. Next, you will discover the Apache Beam APIs which allow one to define pipelines that process batch as well as streaming data. Finally, you will learn how windowing operations can be applied to streaming data. When you are finished with this course, you will have a strong grasp of the models and architectures used with streaming data and be able to work with the Beam unified model to define and run transformations on input streams.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
[Autogenerated] Hi, My name is Jonny Robbie, and welcome to the scores on modeling streaming data for processing with a party, a beam a little about myself. I have a masters in electrical engineering from Stanford on have worked at companies such as Microsoft, Google and Flip Kart. I currently work on my own startup Loony Con, a studio for high quality video content. Streaming data usually needs to be processed real time or near real time, which means stream processing systems need tohave capabilities that allow them to process data with low latency, high performance and fault tolerance. In this course, you will understand the nuances and challenges off working with streams and use the beam unified model toe build data, parallel pipelines. You'll start. This goes off by understanding the similarities and differences between batch processing and stream processing. We'll discuss the processing models and architectures that stream processing systems use and see the tradeoffs involved in the range of choices available. Next, you'll get started with the Apache beam APIs, which allow us to define pipelines that process batch as fella streaming data. You'll understand the basic components off a beam pipeline peak elections and P transforms on define and execute simple pipeline operations using the beam Direct Runner. Finally, you will see how win doing operations can be applied to streaming data. You will study the different types of windows that beam supports that is fixed windows, sliding windows, session windows and global windows on. You will see how these windows can be applied to input streams. When you're finished with this course, you will have a strong grasp off the models and architectures used with streaming data, and you will be able to work with the beam, unified model toe define and run transformations on input streams.