Paths

Google: Professional Cloud Data Engineer

Authors: Janani Ravi, Vitthal Srinivasan

This skill path covers all the objectives needed to be a Data Engineer on Google Cloud. You will learn in depth how to use the products and services as well as how to complete the... Read more

What you will learn

  • Dataproc
  • Dataflow and Apache Bean
  • GCP Pub/Sub
  • BigQuery
  • GCP Cloud SQL
  • GCP Cloud Spanner
  • Cloud Datastore
  • Frestore
  • BigTable
  • Datalab
  • ML Engine
  • Machine Learning APIs
  • Data Architecture on GCP

Pre-requisites

Learners should be familiar with cloud computing and the Google Cloud Platform. It is also assumed that Learners are already Data and ML professionals but are learning to complete their projects on Google Cloud Platform.

Beginner

In this beginning section of the path you’ll learn how to use Dataproc, Google Composer, Dataflow, and Apache Stream Processing. You’ll be architecting solutions, and beginning to build out the pipeline for your data projects. After this section you’ll be ready for more intermediate topics like incorporating machine learning models.

Architecting Big Data Solutions Using Google Dataproc

by Janani Ravi

Nov 1, 2018 / 2h 17m

2h 17m

Start Course
Description

When organizations plan their move to the Google Cloud Platform, Dataproc offers the same features but with additional powerful paradigms such as separation of compute and storage. Dataproc allows you to lift-and-shift your Hadoop processing jobs to the cloud and store your data separately on Cloud Storage buckets, thus effectively eliminating the requirement to keep your clusters always running. In this course, Architecting Big Data Solutions Using Google Dataproc, you’ll learn to work with managed Hadoop on the Google Cloud and the best practices to follow for migrating your on-premise jobs to Dataproc clusters. First, you'll delve into creating a Dataproc cluster and configuring firewall rules to enable you to access the cluster manager UI from your local machine. Next, you'll discover how to use the Spark distributed analytics engine on your Dataproc cluster. Then, you'll explore how to write code in order to integrate your Spark jobs with BigQuery and Cloud Storage buckets using connectors. Finally, you'll learn how to use your Dataproc cluster to perform extract, transform, and load operations using Pig as a scripting language and work with Hive tables. By the end of this course, you'll have the necessary knowledge to work with Google’s managed Hadoop offering and have a sound idea of how to migrate jobs and data on your on-premise Hadoop cluster to the Google Cloud.

Table of contents
  1. Course Overview2m
  2. Introducing Google Dataproc for Big Data on the Cloud 42m
  3. Running Hadoop MapReduce Jobs on Google Dataproc 49m
  4. Working with Apache Spark on Google Dataproc 24m
  5. Working with Pig and Hive on Google Dataproc 18m

Building Pipelines for Workflow Orchestration Using Google Composer

by Vitthal Srinivasan

Jan 15, 2019 / 1h 58m

1h 58m

Start Course
Description

Cloud Composer is a pipeline orchestration service on the GCP. Based on the Apache Airflow API, Composer was launched in May 2018 and is fast-emerging as a popular and versatile service for building and executing system pipelines. In this course, Building Pipelines for Workflow Orchestration Using Google Composer, you'll learn how Composer allows cloud users to quickly create pipelines with complex interconnected tasks. First, you'll discover where Composer fits in the taxonomy of GCP services and how it compares to Dataflow, which is another service for building and executing pipelines on the GCP. Next, you'll explore what a Composer environment is and how pipelines are specified, and run on these environments. Then, you'll develop an understanding of the powerful suite of operators made available for use within Composer pipelines by utilizing Airflow operators for executing shell scripts, executing arbitrary Python code, and implementing complex control flow. Finally, you'll learn how to use Airflow’s GCP-specific operators for sending email, working with BigQuery, and instantiating Dataproc clusters. When you're finished with this course, you'll have the skills and knowledge necessary to build and deploy complex pipelines built on the Apache Airflow API by utilizing Composer.

Table of contents
  1. Course Overview2m
  2. Introducing Google Composer35m
  3. Creating, Configuring, and Accessing Environments46m
  4. Managing and Monitoring Workflows34m

Architecting Serverless Big Data Solutions Using Google Dataflow

by Janani Ravi

Dec 14, 2018 / 2h 15m

2h 15m

Start Course
Description

Dataflow allows developers to process and transform data using easy, intuitive APIs. Dataflow is built on the Apache Beam architecture and unifies batch as well as stream processing of data. In this course, Architecting Serverless Big Data Solutions Using Google Dataflow, you will be exposed to the full potential of Cloud Dataflow and its radically innovative programming model. You will start this course off with a basic understanding of how Dataflow works for serverless compute. You’ll study the Apache Beam API used to build pipelines and understand what data sources, sinks, and transformations are. You’ll study the stages in a Dataflow pipeline and visualize it as a directed-acyclic graph. Next, you'll use Apache Beam APIs to build pipelines for data transformations in both Java as well as Python and execute these pipelines locally and on the cloud. You’ll integrate your pipelines with other GCP services such as BigQuery and see how you can monitor and debug slow pipeline stages. Additionally, you'll study different pipeline architectures such as branching and pipelines using side inputs. You’ll also see how you can apply windowing operations to perform aggregations on our data. Finally, you’ll work with Dataflow without writing any code using pre-built Dataflow templates that Google offers for common operations. At the end of this course, you should be comfortable using Dataflow pipelines to transform and process your data and integrate your pipelines with other Google services.

Table of contents
  1. Course Overview1m
  2. Introducing Dataflow49m
  3. Understanding and Using the Apache Beam APIs39m
  4. Creating and Using PCollections and Side Inputs28m
  5. Creating Pipelines from Google Templates16m

Architecting Stream Processing Solutions Using Google Cloud Pub/Sub

by Vitthal Srinivasan

Jan 8, 2019 / 1h 44m

1h 44m

Start Course
Description

As data warehousing and analytics become more and more integrated into the business models of companies, the need for real-time analytics and data processing has grown. Stream processing has quickly gone from being nice-to-have to must-have. In this course, Architecting Stream Processing Solutions Using Google Cloud Pub/Sub, you will gain the ability to ingest and process streaming data on the Google Cloud Platform, including the ability to take snapshots and replay messages. First, you will learn the basics of a Publisher-Subscriber architecture. Publishers are apps that send out messages, these messages are organized into Topics. Topics are associated with Subscriptions, and Subscribers need to listen in on subscriptions. Each subscription is a message queue, and messages are held in that queue until at least one subscriber per subscription has acknowledged the message. This is why Pub/Sub is said to be a reliable messaging system. Next, you will discover how to create topics, as well as how to push and pull subscriptions. As their names would suggest, push and pull subscriptions differ in who controls the delivery of messages to the subscriber. Finally, you will explore how to leverage advanced features of Pub/Sub such as creating snapshots, and seeking to a specific timestamp, either in the past or in the future. You will also learn the precise semantics of creating snapshots and the implications of turning on the “retain acknowledged messages” option on a subscription. When you’re finished with this course, you will have the skills and knowledge of Google Cloud Pub/Sub needed to effectively and reliably process streaming data on the GCP.

Table of contents
  1. Course Overview2m
  2. Getting Started with Cloud Pub/Sub29m
  3. Configuring Publishers, Subscribers, and Topics38m
  4. Using the Cloud Pub/Sub Client Library33m

Intermediate

In this section you’ll begin incorporating more intermediate products and functions in Google Cloud such as Big Query and Big Query ML, Google SQL Instances, Datastores, and Bigtable. This is the database heavy portion of the process and you’ll also spend time developing repositories. After this section you’ll be ready for the advanced section that will dive deeper into designing machine learning and working with APIs.

Architecting Data Warehousing Solutions Using Google BigQuery

by Janani Ravi

Oct 15, 2018 / 2h 48m

2h 48m

Start Course
Description

Organizations store massive amounts of data that gets collated from a wide variety of sources. BigQuery supports fast querying at a petabyte scale, with serverless functionality and autoscaling. BigQuery also supports streaming data, works with visualization tools, and interacts seamlessly with Python scripts running from Datalab notebooks. In this course, Architecting Data Warehousing Solutions Using Google BigQuery, you’ll learn how you can work with BigQuery on huge datasets with little to no administrative overhead related to cluster and node provisioning. First, you'll start off with an overview of the suite of storage products on the Google Cloud and the unique position that BigQuery holds. You’ll see how BigQuery compares with Cloud SQL, BigTable, and Datastore on the GCP and how it differs from Amazon Redshift, the data warehouse on AWS. Next, you’ll create datasets in BigQuery which are the equivalent of databases in RDMBSes and create tables within datasets where actual data is stored. You’ll work with BigQuery using the web console as well as the command line. You’ll load data into BigQuery tables using the CSV, JSON, and AVRO format and see how you can execute and manage jobs. Finally, you'll wrap up by exploring advanced analytical queries which use nested and repeated fields. You’ll run aggregate operations on your data and use advanced windowing functions as well. You’ll programmatically access BigQuery using client libraries in Python and visualize your data using Data Studio. At the end of this course, you'll be comfortable working with huge datasets stored in BigQuery, executing analytical queries, performing analysis, and building charts and graphs for your reports.

Table of contents
  1. Course Overview2m
  2. Understanding BigQuery in the GCP Service Taxonomy42m
  3. Using Datasets, Tables, and Views in BigQuery30m
  4. Getting Data in and out of BigQuery 27m
  5. Performing Advanced Analytical Queries in BigQuery46m
  6. Programmatically Accessing BigQuery from Client Programs19m

Building Machine Learning Models in SQL Using BigQuery ML

by Janani Ravi

Nov 20, 2018 / 1h 27m

1h 27m

Start Course
Description

This course demonstrates how to build and train machine learning models for linear and logistic regression using SQL commands on BigQuery, the Google Cloud Platform’s serverless data warehouse. In this course, Building Machine Learning Models in SQL Using BigQuery ML, you'll learn how to build and train machine learning models and how to employ those models for prediction - all with just simple SQL commands on data stored in BigQuery. First, you'll understand the different choices available on the GCP if you would like to build and train your models and see how you can make the right choice between these services for your specific use case. Then, you'll work with some real-world datasets stored in BigQuery to build linear regression and binary classification models. Because BigQuery allows you to specify training parameters to build and train your model in SQL, machine learning is made accessible to even those who are not familiar with high-level programming languages. Last, you'll study how to analyze the models that we built using evaluation and feature inspection functions in BigQuery, and run BigQuery commands on Cloud Datalab using a Jupyter notebook that is hosted on the GCP and closely integrated with all of GCPs services. By the end of this course, you'll have a good understanding of how you can use BigQuery ML to extract insights from your data by applying linear and logistic regression models.

Table of contents
  1. Course Overview2m
  2. Introducing Google BigQuery ML24m
  3. Building Regression and Classification Models39m
  4. Analyzing Models Using Evaluation and Feature Inspection Functions21m

Creating and Administering Google Cloud SQL Instances

by Janani Ravi

Sep 24, 2018 / 2h 29m

2h 29m

Start Course
Description

An important component of an organization's on-premises solution is the relational database. Cloud SQL is an RDBMS offering on the GCP which makes the operational and administrative aspects of databases very easy to handle. In this course, Creating and Administering Google Cloud SQL Instances, you will learn how to create, work with and manage Cloud SQL instances on the GCP. First, you will assess the range of data storage services on the GCP and understand when you would choose to use Cloud SQL over other technologies. Then, you will create Cloud SQL instances, connect to them using a simple MySQL client, and configure and administer these instances using the web console as well as the gcloud command line utility. Next, you will focus on how Cloud SQL can work in high-availability mode. After that, you will configure failover replicas for high-availability and simulate an outage event to see how the failover replica kicks in. FInally, you will see how to use read replicas for increased read throughput and how data can be migrated into Cloud SQL instances using a SQL dump or from CSV files. At the end of this course, you will be comfortable creating, connecting to, and administering Cloud SQL instances to manage relational databases on the Google Cloud Platform.

Table of contents
  1. Course Overview1m
  2. Understanding Cloud SQL in the GCP Service Taxonomy40m
  3. Creating Cloud SQL Instances59m
  4. Replication and Data Management46m

Creating and Administering Google Cloud Spanner Instances

by Vitthal Srinivasan

Jan 16, 2019 / 1h 59m

1h 59m

Start Course
Description

Relational Databases have traditionally relied on vertical scaling, but Google’s Cloud Spanner is carefully architected to provide horizontal scaling and global replication with all the rigors of strong consistency. Because Spanner is such a unique product, getting the best out of it does require you to understand its subtleties. In this course, Creating and Administering Google Cloud Spanner Instances, you will gain the ability to identify when Spanner is the right tool for you, and then correctly design your data and configure your instance to get the best out of Spanner’s formidable capabilities. First, you will learn where Cloud Spanner fits in the suite of Google Cloud Platform (GCP) storage technologies and how it compares to BigQuery, Cloud SQL, and others. Next, you will discover Spanner’s data model and how it enables horizontal scaling. Finally, you will explore how to use Spanner in conjunction with other GCP services, notably Dataflow templates, for migrating data into Spanner. When you are finished with this course, you will have the skills and knowledge of Cloud Spanner needed to architect solutions to problems that require global replication, strong consistency, and horizontal scaling in a relational database management system (RDBMS).

Table of contents
  1. Course Overview2m
  2. Getting Started with Cloud Spanner1h 2m
  3. Creating and Managing Tables in Cloud Spanner35m
  4. Integrating Cloud Spanner with Other Google Cloud Services17m

Architecting Schemaless Scalable NoSQL Databases Using Google Datastore

by Vitthal Srinivasan

Jan 10, 2019 / 1h 48m

1h 48m

Start Course
Description

A suite of big data technologies is considered incomplete unless it includes a solution optimized for document-oriented data and hierarchical queries, and that can provide the blazingly fast lookup that web serving applications need to perform on such data. In this course, Architecting Schemaless Scalable NoSQL Databases Using Google Datastore, you will gain the ability to identify situations when Datastore is right for you, and query it both interactively and programmatically. First, you will learn exactly how Datastore contrasts with other GCP technologies such as BigQuery, BigTable and Firestore. Datastore is all about fast reads; Datastore only supports queries whose runtime depends only the size of the result set, and not on the size of the total data set. This is a remarkable guarantee, and it is achieved via a combination of heavy usage of indices, and of constraints on the types of queries that are supported. Next, you will discover Datastore’s unique data model, which users often find hard to navigate. Datastore organizes documents into categories called kinds; each individual document is called an entity and belongs to a kind. Finally, you will explore how to perform administrative and backup operations and work with Datastore pro-grammatically. When you’re finished with this course, you will have the skills and knowledge of Google Datastore needed to design and implement a storage solution optimized for fast querying of hierarchical, document-oriented data.

Table of contents
  1. Course Overview2m
  2. Understanding Cloud Datastore in the GCP Service Taxonomy27m
  3. Querying and Using Cloud Datastore49m
  4. Administering and Managing Cloud Datastore28m

Leveraging Fully Managed Redis Datastores Using Google Cloud Memorystore

by Vitthal Srinivasan

Jan 11, 2019 / 1h 31m

1h 31m

Start Course
Description

Due to its in-memory nature, Memorystore features some of the lowest latencies on the platform, down to sub-millisecond levels. This managed-Redis service is hosted on Google’s highly scalable infrastructure, which means that it can support instances up to 300 GB and network throughput of 12 Gbps. Memorystore offers an easy migration path for users of Redis, a technology that is fast gaining popularity, especially for use from within Docker containers running on Kubernetes. In this course, Leveraging Fully Managed Redis Datastores Using Google Cloud Memorystore, you'll examine all of these aspects of working with Memorystore, and learn how to get the best out of this powerful managed database service. First, you will explore the suite of storage products that are available on the GCP and where exactly Memorystore fits in. You will be introduced to the capabilities of using Redis to cache data for transactions, and as a publisher-subscriber message delivery system, and you will learn about the LRU eviction policies that Memorystore follows. Next, you will implement Memorystore integrations with applications that you host on Compute Engine VMs, App Engine, and on Google Kubernetes Engine clusters. These are the current options that the GCP supports for working with managed Redis. Finally, you will dive into how you can configure Memorystore for high-availability configurations. Memorystore offers two Redis tiers: basic tier and standard tier instances. Basic tier instances do not support cross-zone replication and failover, while standard tier applications are equipped with both features. In addition, the standard tier offers far lower downtime during scaling. You’ll also see how you can monitor Redis instances using Stackdriver. When you’re done with this course, you will have a good understanding of how you can use Memorystore to cache your data on the cloud and know how you can integrate managed Redis with your applications running on various compute options on the GCP.

Table of contents
  1. Course Overview2m
  2. Creating and Managing Redis Instances46m
  3. Configuring and Using Cloud Memorystore26m
  4. Scaling and Monitoring Cloud Memorystore15m

Leveraging Google Cloud Firestore for Realtime Database Solutions

by Janani Ravi

Jan 16, 2019 / 2h 13m

2h 13m

Start Course
Description

Cloud Firestore is a flexible, scalable, realtime database where users can be notified when data changes in the cloud. Cloud Firestore is often used for mobile and web applications where there are multiple-clients who need to be kept in sync. In this course, Leveraging Google Cloud Firestore for Realtime Database Solutions, you will study the data model and practical usage of two realtime databases offered as a part of Firebase, Google’s mobile and web application development platform. First, you will explore Cloud Firestore, a highly, scalable and performant NoSQL database which allows for low latency create, read, update, and delete operations. Then, you will understand the basic data model of Firestore where documents help model hierarchical relationships. Next, you will see how you can secure data stored on Cloud Firestore, focusing on security rules which allow very granular specification of how data can be accessed from mobile and web client applications. Finally, you will delve into the original realtime database offering on Firebase, the Realtime Database. At the end of this course, you will have all the knowledge and skills to leverage the right realtime database for your use case, and structure data based on best practices for low latency and high performance.

Table of contents
  1. Course Overview2m
  2. Managing and Querying Data Using the Cloud Firestore API1h 4m
  3. Securing Data in Cloud Firestore Using Security Rules28m
  4. Using Firebase Realtime Database38m

Architecting Scalable Web Applications with Firebase on the Google Cloud Platform

by Janani Ravi

Jan 10, 2019 / 1h 50m

1h 50m

Start Course
Description

Firebase is Google’s comprehensive mobile and app development platform which has features and integrations with the Google Cloud Platform which allow developers to build applications quickly without managing infrastructure.In this course, Architecting Scalable Web Applications with Firebase on the Google Cloud Platform, you will explore some of Firebase's features and services and build simple web applications to integrate them into your product. First, you will see how Cloud Functions for Firebase allow you to build event-driven solutions for your applications. Next, you will learn how you can use web hosting on Firebase to deploy and host your web applications with just a few clicks. Finally, you will use Firebase Cloud Messaging to allow your applications to respond to in-app notifications and marketing messages. At the end of this course, you will be comfortable using services on the Firebase platform and harness its powerful features as well as its integration with the Google Cloud Platform for your web applications.

Table of contents
  1. Course Overview2m
  2. Using Cloud Functions for Firebase53m
  3. Using Firebase Hosting to Deploy Web Applications36m
  4. Using Firebase Cloud Messaging with Web Apps18m

Architecting Big Data Solutions Using Google Bigtable

by Janani Ravi

Dec 4, 2018 / 2h 2m

2h 2m

Start Course
Description

Bigtable is Google’s proprietary storage service that offers extremely fast read and write speeds. It uses a sophisticated internal architecture which learns access patterns and moves around your data to mitigate the issue of hot-spotting. In this course, Architecting Big Data Solutions Using Google Bigtable, you’ll learn both the conceptual and practical aspects of working with Bigtable. You’ll learn how to best to design your schema to enable fast reads and write speeds and discover how data in Bigtable can be accessed using the command line as well as client libraries. First, you’ll study the internal architecture of Bigtable and how data is stored within it using the 4-dimensional data model. You’ll also discover how Bigtable clusters, nodes, and instances work and how Bigtable works with Colossus - Google’s proprietary storage system behind the scenes. Next, you’ll access Bigtable using both the HBase shell as well as cbt, Google’s command line utility. Later, you'll create and manage tables while practice exporting and importing data using sequence files. Finally, you’ll study how manual fail-overs can be handled when we have single cluster routing enabled. At the end of this course, you’ll be comfortable working with Bigtable using both the command line as well as client libraries.

Table of contents
  1. Course Overview2m
  2. Introducing Cloud Bigtable57m
  3. Interacting with Cloud Bigtable Using cbt and the HBase API36m
  4. Managing Cloud Bigtable Instances, Clusters, and Nodes26m

Developing on the Google Cloud Using Datalab and Cloud Source Repositories

by Vitthal Srinivasan

Jan 9, 2019 / 1h 48m

1h 48m

Start Course
Description

At the core of cloud development is a thorough knowledge of Cloud Source Repositories. In this course, Developing on the Google Cloud Using Datalab and Cloud Source Repositories, you’ll learn how to use developer tools on the Google cloud. First, you’ll learn the features that these products offer and see how easily they can integrate with other GCP services. Next, you’ll explore the suite of developer tools that you can work with on the Google Cloud and how to pick the best tool for your use case. Finally, you’ll discover how to create your Cloud Datalab instance for data exploration and visualization. When you’re finished with this course, you’ll have a foundational knowledge of Google Cloud Source Repositories which will help you as you move forward in bringing your code, compute, and data all together on one platform.

Table of contents
  1. Course Overview2m
  2. Understanding Developer Tools on the GCP20m
  3. Using Cloud Datalab for On-cloud Development50m
  4. Using Cloud Source Repositories for On-cloud Development35m

Advanced

In this section you’ll cover machine learning heavy topics such as working with AutoML, ML Engine, and designing data architectures that are specific to Google Cloud. After this section you will have learned the main critical functions and services you need on Google Cloud to work on the job as a Data Engineer.

Designing and Implementing Solutions Using Google Machine Learning APIs

by Janani Ravi

Oct 19, 2018 / 1h 37m

1h 37m

Start Course
Description

The Google Cloud Platform makes a wide range of machine learning (ML) services available as a part of Google Cloud AI. Google Cloud Machine Learning APIs are the most accessible and lightweight service which makes powerful ML models available to even novice programmers using simple, intuitive APIs. In this course, Designing and Implementing Solutions Using Google Machine Learning APIs, you'll learn how you can use and work with Google Machine Learning APIs, which makes powerful pre-trained models on Google’s datasets. First, you'll delve into an overview of the machine learning services suite available on the Google Cloud, and understand the features of each so you can make the right choice about what service makes sense for your use case. Next, you'll discover speech-based APIs allowing you to convert speech-to-text and text-to-speech with additional emphasis support using SSML, and how you can call these REST APIs using simple Python libraries. Then, you'll learn about Natural Language APIs and see how they can be used for sentiment analysis and for language translation. Finally, you'll explore the Vision and Video Intelligence APIs in order to perform face and label detection on images. By the end of this course, you'll have the necessary knowledge to choose the right ML API that fits your use case and use multiple APIs together to build more complex features for your product.

Table of contents
  1. Course Overview2m
  2. Introducing the Google Cloud ML APIs 22m
  3. Working with Speech and Text Using the Cloud ML APIs 31m
  4. Working with Language Using the Cloud ML APIs 18m
  5. Working with Images and Videos Using the Cloud ML APIs 22m

Designing and Implementing Solutions Using Google Cloud AutoML

by Janani Ravi

Oct 12, 2018 / 1h 41m

1h 41m

Start Course
Description

Most organizations want to harness the power of machine learning in order to improve their products, but they may not always have the expertise available in-house. In this course, Designing and Implementing Solutions Using Google Cloud AutoML, you’ll learn how you can train custom machine learning models on your dataset with just a few clicks on the UI or a few commands on a terminal window. This course will also show how engineers and analysts can harness the power of ML for common use cases by using AutoML to build their own model, trained on their own data, without needing any specific machine learning expertise. First, you'll see an overview of the suite of machine learning services available on the Google Cloud and understand the features of each so you can make the right choice of service for your use case. You’ll learn about the basic concepts underlying AutoML which uses neural architecture search and transfer learning to find the best neural network for your custom use case. Next, you'll explore AutoML’s translation model, and feed in sentence pairs to the TMX format to perform German-English translation. You’ll use your custom model for prediction from the UI, from the command line, and by using Python APIs. You’ll also learn to understand the significance of the BLEU score to analyze the quality of your translation model. Finally, you'll use the natural language APIs that AutoML offers to build a model for sentiment analysis of reviews and work with AutoML for image classification using the AutoML Vision APIs. You'll finish up by learning the basic requirements of the data needed to train this model and develop a classifier that can identify fruits. At the end of this course, you will be very comfortable choosing the right ML API that fits your use case and using AutoML to build complex neural networks trained on your own dataset for common problems.

Table of contents
  1. Course Overview2m
  2. Introducing Google Cloud AutoML20m
  3. Performing Custom Translation Using AutoML Translation38m
  4. Working with Language Using AutoML Natural Language23m
  5. Working with Images Using AutoML Vision16m

Architecting Production-ready ML Models Using Google Cloud ML Engine

by Vitthal Srinivasan

Jan 9, 2019 / 2h 12m

2h 12m

Start Course
Description

Building machine learning models using Python and a machine learning framework is the first step towards building an enterprise-grade ML architecture, but two key challenges remain: training the model with enough computing firepower to get the best possible model and then making that model available to users who are not data scientists or even Python users. In this course, Architecting Production-ready ML Models Using Google Cloud ML Engine, you will gain the ability to perform on-cloud distributed training and hyperparameter tuning, as well as learn to make your ML models available for use in prediction via simple HTTP requests. First, you will learn to use the ML Engine for models built in XGBoost. XGBoost is an ML framework that utilizes a technique known as Ensemble Learning to construct a single, strong model by combining several weak learners, as they are known. Next, you will discover how easy it is to port serialized models from on-premise to the GCP. You will build a simple model in scikit-learn, which is the most popular classic ML framework, and then serialized that model and port it over for use on the GCP using ML Engine. Finally, you will explore how to tap the full power of distributed training, hyperparameter tuning, and prediction in TensorFlow, which is one of the most popular libraries for deep learning applications. You will see how a JSON environment variable called TF_CONFIG is used to share state information and optimize the training and hyperparameter tuning process. When you’re finished with this course, you will have the skills and knowledge of the Google Cloud ML Engine needed to get the full benefits of distributed training and make both batch and online prediction available to your client apps via simple HTTP requests.

Table of contents
  1. Course Overview2m
  2. Introducing Google Cloud ML Engine28m
  3. Deploying XGBoost Models to Cloud ML Engine38m
  4. Deploying Scikit-learn Models to Cloud ML Engine24m
  5. Deploying TensorFlow Models to Cloud ML Engine37m

Designing Scalable Data Architectures on the Google Cloud

by Janani Ravi

Jan 16, 2019 / 1h 58m

1h 58m

Start Course
Description

The Google Cloud Platform offers up a very large number of services, for every important aspect of public cloud computing. This array of services and choices can often seem intimidating - even a practitioner who understands several important services might have trouble connecting the dots, as it were, and fitting together those services in meaningful ways. In this course, Designing Scalable Data Architectures on the Google Cloud, you will gain the ability to design lambda and kappa architectures that integrate batch and streaming, plan intelligent migration and disaster-recovery strategies, and pick the right ML workflow for your enterprise. First, you will learn why the right choice of stream processing architecture is becoming key to the entire design of a cloud-based infrastructure. Next, you will discover how the Transfer Service is an invaluable tool in planning both migration and disaster-recovery strategies on the GCP. Finally, you will explore how to pick the right Machine Learning technology for your specific use. When you’re finished with this course, you will have the skills and knowledge of the entire cross-section of Big Data and Machine Learning offerings on the GCP to build cloud architectures that are optimized for scalability, real-time processing, and the appropriate use of Deep Learning and AI technologies.

Table of contents
  1. Course Overview3m
  2. Implementing Integrated Batch and Streaming Architectures on the GCP39m
  3. Designing Migration and Disaster Recovery Strategies on the GCP37m
  4. Designing Robust ML Workflows on the GCP38m