Expanded

Predictive Analytics Using Apache Spark MLlib on Databricks

This course will teach you to understand and implement important techniques for predictive analytics such as regression and classification using Apache Spark MLlib APIs on Databricks.
Course info
Level
Advanced
Updated
Oct 26, 2021
Duration
1h 57m
Table of contents
Description
Course info
Level
Advanced
Updated
Oct 26, 2021
Duration
1h 57m
Your 10-day individual free trial includes:

Expanded library

This course and over 7,000+ additional courses from our full course library.

Hands-on library

Practice and apply knowledge faster in real-world scenarios with projects and interactive courses.
*Available on Premium only
Description

The Spark unified analytics engine is one of the most popular frameworks for big data analytics and processing. Spark offers extremely comprehensive and easy to use APIs for machine learning which you can use to build predictive models for regression and classification and pre-process data to feed into these models.

In this course, Predictive Analytics Using Apache Spark MLlib on Databricks, you will learn to implement machine learning models using Spark ML APIs. First, you will understand the different Spark libraries available for machine learning, the older RDD-based library, and the newer DataFrame based library. You will then explore the range of transformers available in Spark for pre-processing data for machine learning - such as scaling and standardization transformers for numeric data and label encoding and one-hot encoding transformers for categorical data.

Next, you will use linear regression and ensemble models such as random forest and gradient boosted trees to build regression models. You will use these models for prediction on batch data. In addition, you will also see how you can use Spark ML Pipelines to chain together transformers and estimators to build a complete machine learning workflow.

Finally, you will implement classification models using logistic regression as well as decision trees. You will train the ML model using batch data but perform predictions on streaming data. You will also use hyperparameter tuning and cross-validation to find the best model for your data.

When you’re finished with this course, you’ll have the skills and knowledge to create ML models with Spark MLlib needed to perform predictive analysis using machine learning.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Machine Learning for Financial Services
Beginner
1h 50m
Nov 24, 2021
Machine Learning for Healthcare
Beginner
1h 48m
Nov 24, 2021
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi, my name is Janani Ravi, and welcome to this course on Predictive Analytics Using Apache Spark MLlib on Databricks. A little about myself, I have a Master's degree in electrical engineering from Stanford and have worked at companies such as Microsoft, Google, and Flipkart. I currently work on my own startup, Loonycorn, a studio for high‑quality video content. The Spark unified analytics engine is one of the most popular frameworks for big data analytics and processing. Spark offers extremely comprehensive and easy‑to‑use APIs for machine learning, which you can use to build predictive models for regression and classification, and also, pre‑process data to fit into these models. In this course, you will learn to implement machine learning models using Spark ML APIs. You will explore a range of transformers available in Spark for preprocessing data for ML, such as scaling and standardization transformers for numeric data and label encoding and one‑hot encoding transformers for categorical data. Next, you will use linear regression and ensemble models, such as random forests and gradient boosted trees to build regression models. You will use these models for prediction on batch data. In addition, you'll also see how you can use Spark ML pipelines to chain together transformers and estimators to build a complete machine learning workflow. Finally, you will implement classification models using logistic regression as well as decision trees. You will train the ML model using batch data but perform predictions on streaming data. You'll also use hyperparameter tuning and cross‑validation to find the best model for your data. When you're finished with this course, you will have the skills and knowledge to create ML models with Spark MLlib needed to perform predictive analytics using machine learning.