
Paths
Data Engineering on Google Cloud
This path provides participants a hands-on introduction to designing and building data processing systems on Google Cloud Platform. Through a combination of presentations, demos,... Read more
What you will learn
This path teaches the following skills
- Design and build data processing systems on Google Cloud Platform
- Lift and shift your existing Hadoop workloads to the Cloud using Cloud Dataproc.
- Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow
- Manage your data Pipelines with Data Fusion and Cloud Composer.
- Derive business insights from extremely large datasets using Google BigQuery
- Learn how to use pre-built ML APIs on unstructured data and build different kinds of ML models using BigQuery ML.
- Enable instant insights from streaming data
Pre-requisites
Participants should have experience with one or more of the following:
• A common query language such as SQL • Extracting, Loading, Transforming, cleaning, and validating data • Designing pipelines and architectures for data processing • Integrating analytics and machine learning capabilities into data pipelines • Querying datasets, visualizing query results and creating reports
Beginner
This section introduces participants to the Big Data and Machine Learning capabilities of Google Cloud Platform (GCP). It provides a quick overview of the Google Cloud Platform and a deeper dive of the data processing capabilities.
Google Cloud Platform Big Data and Machine Learning Fundamentals
4h 55m
Description
This course introduces participants to the big data capabilities of Google Cloud. Through a combination of presentations, demos, and hands-on labs, participants get an overview of Google Cloud and a detailed view of the data processing and machine learning capabilities. This course showcases the ease, flexibility, and power of big data solutions on Google Cloud.
Table of contents
- Introduction to the Data and Machine Learning on Google Cloud Course
- Introduction to Google Cloud Platform
- Recommending Products using Cloud SQL and Spark
- Predict Visitor Purchases Using BigQuery ML
- Real-time IoT Dashboards with Pub/Sub, Dataflow, and Data Studio
- Deriving Insights from Unstructured Data using Machine Learning
- Summary
Intermediate
This section opens with the two key components of any data pipeline, which are data lakes and warehouses. The first course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud Platform in technical detail. Also, the course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment. Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. Hence, the second course in this section describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud Platform for data transformation including BigQuery, executing Spark on Cloud Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Cloud Dataflow.
Modernizing Data Lakes and Data Warehouses with GCP
3h 34m
Description
The two key components of any data pipeline are data lakes and warehouses. This course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud Platform in technical detail. Also, this course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment. Learners will get hands-on experience with data lakes and warehouses on Google Cloud Platform using QwikLabs.
Table of contents
- Introduction
- Introduction to Data Engineering
- Building a Data Lake
- Building a data warehouse
- Summary
Building Batch Data Pipelines on GCP
2h 42m
Description
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud Platform for data transformation including BigQuery, executing Spark on Cloud Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Cloud Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud Platform using QwikLabs.
Table of contents
- Introduction
- Introduction to Batch Data Pipelines
- Executing Spark on Cloud Dataproc
- Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
- Serverless Data Processing with Cloud Dataflow
- Summary
Advanced
This section covers two things: (ii) Processing streaming data, which is becoming increasingly popular as streaming enables businesses to get real-time metrics on business operations, and (ii) Incorporating machine learning into data pipelines increases the ability of businesses to extract insights from their data. The first course covers how to build streaming data pipelines on Google Cloud Platform. Cloud Pub/Sub is described for handling incoming streaming data. The course also covers how to apply aggregations and transformations to streaming data using Cloud Dataflow, and how to store processed records to BigQuery or Cloud Bigtable for analysis. The second course covers several ways machine learning can be included in data pipelines on Google Cloud Platform depending on the level of customization required. For little to no customization, this course covers AutoML. For more tailored machine learning capabilities, this course introduces AI Platform Notebooks and BigQuery Machine Learning. Also, this course covers how to productionalize machine learning solutions using Kubeflow.
Building Resilient Streaming Analytics Systems on GCP
3h 11m
Description
Processing streaming data is becoming increasingly popular as streaming enables businesses to get real-time metrics on business operations. This course covers how to build streaming data pipelines on Google Cloud Platform. Cloud Pub/Sub is described for handling incoming streaming data. The course also covers how to apply aggregations and transformations to streaming data using Cloud Dataflow, and how to store processed records to BigQuery or Cloud Bigtable for analysis. Learners will get hands-on experience building streaming data pipeline components on Google Cloud Platform using QwikLabs.
Table of contents
- Introduction
- Introduction to Processing Streaming Data
- Serverless Messaging with Cloud Pub/Sub
- Cloud Dataflow Streaming Features
- High-Throughput BigQuery and Bigtable Streaming Features
- Advanced BigQuery Functionality and Performance
- Summary
Smart Analytics, Machine Learning, and AI on GCP
1h 39m
Description
Incorporating machine learning into data pipelines increases the ability of businesses to extract insights from their data. This course covers several ways machine learning can be included in data pipelines on Google Cloud Platform depending on the level of customization required. For little to no customization, this course covers AutoML. For more tailored machine learning capabilities, this course introduces AI Platform Notebooks and BigQuery Machine Learning. Also, this course covers how to productionalize machine learning solutions using Kubeflow. Learners will get hands-on experience building machine learning models on Google Cloud Platform using QwikLabs.
Table of contents
- Introduction
- Introduction to Analytics and AI
- Prebuilt ML model APIs for Unstructured Data
- Big Data Analytics with Cloud AI Platform Notebooks
- Productionizing Custom ML Models
- Custom Model building with SQL in BigQuery ML
- Custom Model Building with Cloud AutoML
- Summary