Building Machine Learning Models in SQL Using BigQuery ML

BigQuery ML on the Google Cloud Platform democratizes machine learning by allowing data analysts and engineers to build and use machine learning models directly from SQL without using any higher level programming language.
Course info
Level
Beginner
Updated
Nov 20, 2018
Duration
1h 27m
Table of contents
Description
Course info
Level
Beginner
Updated
Nov 20, 2018
Duration
1h 27m
Description

This course demonstrates how to build and train machine learning models for linear and logistic regression using SQL commands on BigQuery, the Google Cloud Platform’s serverless data warehouse. In this course, Building Machine Learning Models in SQL Using BigQuery ML, you'll learn how to build and train machine learning models and how to employ those models for prediction - all with just simple SQL commands on data stored in BigQuery. First, you'll understand the different choices available on the GCP if you would like to build and train your models and see how you can make the right choice between these services for your specific use case. Then, you'll work with some real-world datasets stored in BigQuery to build linear regression and binary classification models. Because BigQuery allows you to specify training parameters to build and train your model in SQL, machine learning is made accessible to even those who are not familiar with high-level programming languages. Last, you'll study how to analyze the models that we built using evaluation and feature inspection functions in BigQuery, and run BigQuery commands on Cloud Datalab using a Jupyter notebook that is hosted on the GCP and closely integrated with all of GCPs services. By the end of this course, you'll have a good understanding of how you can use BigQuery ML to extract insights from your data by applying linear and logistic regression models.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Using PyTorch in the Cloud: PyTorch Playbook
Intermediate
2h 21m
Apr 25, 2019
Building Clustering Models with scikit-learn
Intermediate
2h 33m
Apr 24, 2019
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi. My name is Janani Ravi, and welcome to this course on Building Machine Learning Models in SQL Using BigQuery ML. A little about myself. I have a Master's Degree in electrical engineering from Stanford and have worked with companies such as Microsoft, Google, and Flipkart. At Google, I was one of the first engineers working on real-time collaborative editing in Google Docs, and I hold four patents for its underlying technologies. I currently work on my own startup, Loonycorn, a studio for high-quality video content. In this course, you will learn how to build and train machine learning models and how to employ those models for prediction all with just simple SQL commands on data stored in BigQuery. We start off the course with an introduction to machine learning using BigQuery. We'll understand the different choices available on the GCP if you would like to build and train your models and see how you can make the right choice between these services for your specific use case. We'll then work with some real-world datasets stored in BigQuery to build linear regression and binary classification models. BigQuery allows you to specify training parameters to build and train your models in SQL. This has the effect of making machine learning accessible to even those who are not familiar with high-level programming languages. We'll then study how to analyze the models that we build using evaluation and feature inspection functions in BigQuery. We'll also run BigQuery commands on Cloud Datalab, a Jupyter Notebook that is hosted on the GCP and closely integrated with all of GCP services. At the end of this course, you will have a good understanding of how you can use BigQuery ML to extract insights from your data by applying linear and logistic regression models.

Introducing Google BigQuery ML
Hi, and welcome to this course on Building Machine Learning Models in SQL Using BigQuery Machine Learning. Now so far, if you've had to work with machine learning, you've had to use a high- level programming language, such as Python, Java, Scala, etc. But now it's possible to build machine learning models for linear and logistic regression using just SQL queries, that is if your data lives on BigQuery. BigQuery is a serverless cloud data warehouse on the Google Cloud Platform. The BigQuery data warehouse is one of the most popular technologies on the GCP, and it's widely used by business analysts, as well as data scientists. BigQuery is a structured data store, which can ingest data in multiple different formats, CSV files, Avro files, JSON, everything. BigQuery supports structured data including complex data types, such as arrays and structs. A brand new feature that has been added to BigQuery this year is the ability to build machine learning models using the SQL query language. Instead of retrieving data from BigQuery into a Python program in order to build and test your ML models, there is no need to leave BigQuery at all. This feature brings machine learning right to where data is stored, and it democratizes ML to an unprecedented degree.

Building Regression and Classification Models
Hi, and welcome to this module where we'll see how we can use BigQuery ML to build a regression, as well as classification models. As a student of machine learning, these are typically the first machine learning models that you learn and work with. And these are the two that are supported by BigQuery at this point in time. Linear regression tries to find the best- fit straight line that fits through your data so you can then use this linear model in order to make predictions. Binary logistic regression tries to find the best-fit S-curve on your underlying data. Applying a threshold on this curve allows you to use this for binary classification. Is the output true or false, 0 or 1?

Analyzing Models Using Evaluation and Feature Inspection Functions
Hi, and welcome to this module where we'll see how we can analyze the models that we built using the evaluation and feature inspection functions. We'll first see how we can run machine learning models on BigQuery using Cloud Datalab. Cloud Datalab is a VM instance on the GCP, which comes preinstalled with all of the tools that you need for data science and analysis. Demos from the last model showed us that ML modeling involves three phases, training, evaluation, and prediction. BigQuery ML has distinct functions that you can use to get information for each phase. We've already seen some of these functions using BigQuery's web console. We saw hands-on examples for the ML. EVALUATE and ML. PREDICT functions. We'll see some additional functions in this module including ROC curves that we can use to evaluate classification models. ROC curves will also allow us to study the precision and recall metrics in more detail.