Expanded Library

Handling Streaming Data with GCP Dataflow

by Janani Ravi

Dataflow is a serverless, fully-managed service on the Google Cloud Platform for batch and stream processing.

What you'll learn

Dataflow allows developers to process and transform data using easy, intuitive APIs. Dataflow is built on the Apache Beam architecture and unifies batch as well as stream processing of data. In this course, Handling Streaming Data with GCP Dataflow, you will discover the GCP provides a wide range of connectors to integrate the Dataflow service with other GCP services such as the Pub/Sub messaging service and the BigQuery data warehouse.

First, you will see how you can integrate your Dataflow pipelines with other services to use as a source of streaming data or as a sink for your final results.

Next, you will stream live Twitter feeds to the Pub/Sub messaging service and implement your pipeline to read and process these Twitter messages. Finally, you will implement pipelines with a side input, and branching pipelines to write your final results to multiple sinks. When you are finished with this course you will have the skills and knowledge to design complex Dataflow pipelines, integrate these pipelines with other Google services, and test and run these pipelines on the Google Cloud Platform.

Table of contents

Course Overview

About the author

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providin... more

Ready to upskill? Get started