Data Engineering on Google Cloud
- 5 courses
- 14 hours
- Skill IQ
This path provides participants a hands-on introduction to designing and building data processing systems on Google Cloud Platform. Through a combination of presentations, demos, and hand-on labs, participants will learn how to design data processing systems, build end-to-end data pipelines, analyze data and derive insights. The courses cover structured, unstructured, and streaming data.
Courses in this path
This section introduces participants to the Big Data and Machine Learning capabilities of Google Cloud Platform (GCP). It provides a quick overview of the Google Cloud Platform and a deeper dive of the data processing capabilities.
This section opens with the two key components of any data pipeline, which are data lakes and warehouses. The first course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud Platform in technical detail. Also, the course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment. Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. Hence, the second course in this section describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud Platform for data transformation including BigQuery, executing Spark on Cloud Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Cloud Dataflow.
This section covers two things: (ii) Processing streaming data, which is becoming increasingly popular as streaming enables businesses to get real-time metrics on business operations, and (ii) Incorporating machine learning into data pipelines increases the ability of businesses to extract insights from their data. The first course covers how to build streaming data pipelines on Google Cloud Platform. Cloud Pub/Sub is described for handling incoming streaming data. The course also covers how to apply aggregations and transformations to streaming data using Cloud Dataflow, and how to store processed records to BigQuery or Cloud Bigtable for analysis. The second course covers several ways machine learning can be included in data pipelines on Google Cloud Platform depending on the level of customization required. For little to no customization, this course covers AutoML. For more tailored machine learning capabilities, this course introduces AI Platform Notebooks and BigQuery Machine Learning. Also, this course covers how to productionalize machine learning solutions using Kubeflow.