Featured resource
2026 Tech Forecast
2026 Tech Forecast

Stay ahead of what’s next in tech with predictions from 1,500+ business leaders, insiders, and Pluralsight Authors.

Get these insights
  • Course

Architecting Serverless Big Data Solutions Using Google Dataflow

Dataflow represents a fundamentally different approach to Big Data processing than computing engines such as Spark. Dataflow is serverless and fully-managed, meaning that provisioning resources and scaling can be transparent to the data architect.

Beginner
2h 15m
(19)

Created by Janani Ravi

Last Updated Aug 07, 2024

Course Thumbnail
  • Course

Architecting Serverless Big Data Solutions Using Google Dataflow

Dataflow represents a fundamentally different approach to Big Data processing than computing engines such as Spark. Dataflow is serverless and fully-managed, meaning that provisioning resources and scaling can be transparent to the data architect.

Beginner
2h 15m
(19)

Created by Janani Ravi

Last Updated Aug 07, 2024

Get started today

Access this course and other top-rated tech content with one of our business plans.

Try this course for free

Access this course and other top-rated tech content with one of our individual plans.

This course is included in the libraries shown below:

  • Cloud
  • Data
What you'll learn

Dataflow allows developers to process and transform data using easy, intuitive APIs. Dataflow is built on the Apache Beam architecture and unifies batch as well as stream processing of data. In this course, Architecting Serverless Big Data Solutions Using Google Dataflow, you will be exposed to the full potential of Cloud Dataflow and its radically innovative programming model. You will start this course off with a basic understanding of how Dataflow works for serverless compute. You’ll study the Apache Beam API used to build pipelines and understand what data sources, sinks, and transformations are. You’ll study the stages in a Dataflow pipeline and visualize it as a directed-acyclic graph. Next, you'll use Apache Beam APIs to build pipelines for data transformations in both Java as well as Python and execute these pipelines locally and on the cloud. You’ll integrate your pipelines with other GCP services such as BigQuery and see how you can monitor and debug slow pipeline stages. Additionally, you'll study different pipeline architectures such as branching and pipelines using side inputs. You’ll also see how you can apply windowing operations to perform aggregations on our data. Finally, you’ll work with Dataflow without writing any code using pre-built Dataflow templates that Google offers for common operations. At the end of this course, you should be comfortable using Dataflow pipelines to transform and process your data and integrate your pipelines with other Google services.

Architecting Serverless Big Data Solutions Using Google Dataflow
Beginner
2h 15m
(19)
Table of contents

About the author
Janani Ravi - Pluralsight course - Architecting Serverless Big Data Solutions Using Google Dataflow
Janani Ravi
192 courses 4.5 author rating 6281 ratings

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

Get started with Pluralsight