Course

Skills

Architecting Serverless Big Data Solutions Using Google Dataflow

by Janani Ravi

Dataflow represents a fundamentally different approach to Big Data processing than computing engines such as Spark. Dataflow is serverless and fully-managed, meaning that provisioning resources and scaling can be transparent to the data architect.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(19)

Level

Beginner

Updated

Aug 7, 2024

Duration

2h 15m

What you'll learn

Dataflow allows developers to process and transform data using easy, intuitive APIs. Dataflow is built on the Apache Beam architecture and unifies batch as well as stream processing of data. In this course, Architecting Serverless Big Data Solutions Using Google Dataflow, you will be exposed to the full potential of Cloud Dataflow and its radically innovative programming model. You will start this course off with a basic understanding of how Dataflow works for serverless compute. You’ll study the Apache Beam API used to build pipelines and understand what data sources, sinks, and transformations are. You’ll study the stages in a Dataflow pipeline and visualize it as a directed-acyclic graph. Next, you'll use Apache Beam APIs to build pipelines for data transformations in both Java as well as Python and execute these pipelines locally and on the cloud. You’ll integrate your pipelines with other GCP services such as BigQuery and see how you can monitor and debug slow pipeline stages. Additionally, you'll study different pipeline architectures such as branching and pipelines using side inputs. You’ll also see how you can apply windowing operations to perform aggregations on our data. Finally, you’ll work with Dataflow without writing any code using pre-built Dataflow templates that Google offers for common operations. At the end of this course, you should be comfortable using Dataflow pipelines to transform and process your data and integrate your pipelines with other Google services.

Course Overview

1min

Course Overview 2m

Introducing Dataflow

49mins

Understanding and Using the Apache Beam APIs

39mins

Module Overview 2m
Create a Java Project Using Maven 3m
Writing a Dataflow Job in Java 7m
Examining Output Files on Cloud Storage 1m
Find the Top Selling Products 6m
Executing a Java Pipeline on Cloud Dataflow 4m
Execute Jobs and Monitor Logs 3m
Scaling Number of Workers 3m
Identifying Slow Pipeline Stages 3m
Integrating Dataflow with BigQuery 4m
Writing Results to BigQuery 3m

Creating and Using PCollections and Side Inputs

28mins

Module Overview 1m
Window Operations and Types of Windows 6m
Windowing Operations 5m
Executing and Monitoring Windowed Pipelines 3m
Branching Operations 6m
Side Inputs 5m
Executing and Monitoring Pipelines with Side Inputs 3m

Creating Pipelines from Google Templates

16mins

Module Overview 1m
Introducing Dataflow Templates 3m
Introducing Pub/Sub 2m
Execute a Pub/Sub to BigQuery Job Using Dataflow Templates 4m
Publishing Messages and Reading from BigQuery 5m
Summary and Further Study 2m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(19)

Level

Beginner

Updated

Aug 7, 2024

Duration

2h 15m

Ready to upskill? Get started

Contact Sales

Architecting Serverless Big Data Solutions Using Google Dataflow

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Architecting Serverless Big Data Solutions Using Google Dataflow

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?