Expanded

Handling Streaming Data with GCP Dataflow

Dataflow is a serverless, fully-managed service on the Google Cloud Platform for batch and stream processing.
Course info
Level
Advanced
Updated
Dec 11, 2020
Duration
3h 12m
Table of contents
Course Overview
Executing Pipelines on Cloud Dataflow
Integrating Dataflow with Cloud Pub/Sub
Performing Windowing Operations on Streaming Data
Performing Join Operations on Streaming Data
Description
Course info
Level
Advanced
Updated
Dec 11, 2020
Duration
3h 12m
Your 10-day individual free trial includes:

Expanded library

This course and over 7,000+ additional courses from our full course library.

Hands-on library

Practice and apply knowledge faster in real-world scenarios with projects and interactive courses.
*Available on Premium only
Description

Dataflow allows developers to process and transform data using easy, intuitive APIs. Dataflow is built on the Apache Beam architecture and unifies batch as well as stream processing of data. In this course, Handling Streaming Data with GCP Dataflow, you will discover the GCP provides a wide range of connectors to integrate the Dataflow service with other GCP services such as the Pub/Sub messaging service and the BigQuery data warehouse.

First, you will see how you can integrate your Dataflow pipelines with other services to use as a source of streaming data or as a sink for your final results.

Next, you will stream live Twitter feeds to the Pub/Sub messaging service and implement your pipeline to read and process these Twitter messages. Finally, you will implement pipelines with a side input, and branching pipelines to write your final results to multiple sinks. When you are finished with this course you will have the skills and knowledge to design complex Dataflow pipelines, integrate these pipelines with other Google services, and test and run these pipelines on the Google Cloud Platform.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Summarizing Data and Deducing Probabilities
Intermediate
2h 50m
Jul 8, 2021
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi, my name is Janani Ravi, and welcome to this course on Handling Streaming Data with GCP Dataflow. A little about myself. I have a master's degree in electrical engineering from Stanford and have worked at companies such as Microsoft, Google, and Flipkart. At Google, I was one of the first engineers working on real‑time collaborative editing in Google Docs, and I hold four patterns for its underlying technologies. I currently work on my own startup, Loonycorn, a studio for high‑quality video content. Dataflow allows developers to process and transform data using easy, intuitive APIs. Dataflow is built on the Apache Beam architectures and unifies batch as well as stream processing of data. In this course, you will first see how you can integrate your Dataflow pipelines with other GCP services to use as a source of streaming data or as a sink for your final results. You will read data from cloud storage buckets and the Pub/Sub messaging service and write data to the BigQuery data warehouse. You will see how you can use the Dataflow monitoring interface to debug slow stages in your pipeline code. Next, you will stream live Twitter feeds to the Pub/Sub messaging service and implement your pipeline to read and process these Twitter messages. You will perform transformations such as extracting embedded hashtags and performing sentiment analysis on tweets. You will also perform windowing operations on input streams and learn the right method to extract event time timestamps from your streaming elements. Finally, you will implement pipelines with a side input and branching pipelines to write your final results to multiple sinks. You will perform join operations on input streams and write unit tests, as well as end‑to‑end tests for your pipeline code. When you're finished with this course, you will have the skills and knowledge to design complex data flow pipelines, integrate these pipelines with other Google services, and test and run these pipelines on the GCP.