Course

Skills

Architecting Big Data Solutions Using Google Dataproc

Dataproc is Google’s managed Hadoop offering on the cloud. This course teaches you how the separation of storage and compute allows you to utilize clusters more efficiently purely for processing data and not for storage.

Preview this course

What you'll learn

When organizations plan their move to the Google Cloud Platform, Dataproc offers the same features but with additional powerful paradigms such as separation of compute and storage. Dataproc allows you to lift-and-shift your Hadoop processing jobs to the cloud and store your data separately on Cloud Storage buckets, thus effectively eliminating the requirement to keep your clusters always running. In this course, Architecting Big Data Solutions Using Google Dataproc, you’ll learn to work with managed Hadoop on the Google Cloud and the best practices to follow for migrating your on-premise jobs to Dataproc clusters. First, you'll delve into creating a Dataproc cluster and configuring firewall rules to enable you to access the cluster manager UI from your local machine. Next, you'll discover how to use the Spark distributed analytics engine on your Dataproc cluster. Then, you'll explore how to write code in order to integrate your Spark jobs with BigQuery and Cloud Storage buckets using connectors. Finally, you'll learn how to use your Dataproc cluster to perform extract, transform, and load operations using Pig as a scripting language and work with Hive tables. By the end of this course, you'll have the necessary knowledge to work with Google’s managed Hadoop offering and have a sound idea of how to migrate jobs and data on your on-premise Hadoop cluster to the Google Cloud.

Course Overview

2mins

Course Overview 2m

Introducing Google Dataproc for Big Data on the Cloud

42mins

Running Hadoop MapReduce Jobs on Google Dataproc

49mins

Module Overview 1m
Creating a Dataproc Cluster Using the Web Console 7m
Using SSH to Connect to the Master Node 4m
Creating a Firewall Rule to Enable Access to Dataproc 5m
Accessing the Resource Manager and Name Node UI 2m
Upload Data and MapReduce Code to Cloud Storage 4m
Running MapReduce on Dataproc 4m
Running MapReduce Using the gcloud Command Line Utility 4m
Creating a Cluster with Preemptible Instances Using gcloud 3m
Monitoring Clusters Using Stackdriver 5m
Stackdriver Monitoring Groups and Alerting Policies 5m
Configuring Initialization Actions for Dataproc 4m

Working with Apache Spark on Google Dataproc

24mins

Module Overview 1m
Spark for Distributed Processing 4m
Running a Spark Scala Job Using the Web Console 3m
Executing a Spark Application Using gcloud 3m
Creating a BigQuery Table 4m
Pyspark Application Using BiqQuery and Cloud Storage Connectors 4m
Executing a Spark Application to Get Results in BigQuery 3m
Monitoring Spark Jobs on Dataproc 3m

Working with Pig and Hive on Google Dataproc

18mins

Module Overview 1m
Pig for Extract Transform Load 4m
Running Pig Scripts on Dataproc 3m
Storing Pig Output to Cloud Storage 3m
Hive to Query Big Data 3m
Executing Hive Queries on Dataproc 3m
Summary and Further Study 2m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Ready to upskill? Get started

Contact Sales

Architecting Big Data Solutions Using Google Dataproc

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Architecting Big Data Solutions Using Google Dataproc

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?