Course

Skills

Predictive Analytics Using Apache Spark MLlib on Databricks

by Janani Ravi

This course will teach you to understand and implement important techniques for predictive analytics such as regression and classification using Apache Spark MLlib APIs on Databricks.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(11)

Level

Advanced

Updated

Oct 26, 2021

Duration

1h 57m

What you'll learn

The Spark unified analytics engine is one of the most popular frameworks for big data analytics and processing. Spark offers extremely comprehensive and easy to use APIs for machine learning which you can use to build predictive models for regression and classification and pre-process data to feed into these models.

In this course, Predictive Analytics Using Apache Spark MLlib on Databricks, you will learn to implement machine learning models using Spark ML APIs. First, you will understand the different Spark libraries available for machine learning, the older RDD-based library, and the newer DataFrame based library. You will then explore the range of transformers available in Spark for pre-processing data for machine learning - such as scaling and standardization transformers for numeric data and label encoding and one-hot encoding transformers for categorical data.

Next, you will use linear regression and ensemble models such as random forest and gradient boosted trees to build regression models. You will use these models for prediction on batch data. In addition, you will also see how you can use Spark ML Pipelines to chain together transformers and estimators to build a complete machine learning workflow.

Finally, you will implement classification models using logistic regression as well as decision trees. You will train the ML model using batch data but perform predictions on streaming data. You will also use hyperparameter tuning and cross-validation to find the best model for your data.

When you’re finished with this course, you’ll have the skills and knowledge to create ML models with Spark MLlib needed to perform predictive analysis using machine learning.

Course Overview

2mins

Course Overview 2m

Getting Started with Machine Learning with Apache Spark on Databricks

36mins

Performing Regression on Batch Data

43mins

Quick Overview of Linear Regression 5m
Lasso Ridge and Elastic Net Regression 4m
Demo: Exploring the Life Expectancy Dataset 4m
Demo: Building and Evaluating a Linear Regression Model 6m
Demo: Hyperparameter Tuning 4m
Quick Overview of Ensemble Learning 3m
Averaging and Boosting 2m
Machine Learning Pipelines 3m
Demo: Exploring the CO2 Emissions Dataset 4m
Demo: Random Forest Regression 5m
Demo: Gradient Boosted Tree Regression 5m

Implementing Classification on Streaming Data

34mins

Quick Overview of Logistic Regression 6m
Demo: Exploring the Loan Dataset 3m
Demo: Logistic Regression 4m
Demo: Performing Predictions on Streaming Data 5m
Quick Overview of Decision Trees 3m
Demo: Exploring the Bank Marketing Campaign Dataset 3m
Demo: Decision Tree Classifier 7m
Demo: Hyperparameter Tuning with Cross Validation 3m
Summary and Further Study 2m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(11)

Level

Advanced

Updated

Oct 26, 2021

Duration

1h 57m

Ready to upskill? Get started

Contact Sales

Predictive Analytics Using Apache Spark MLlib on Databricks

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Predictive Analytics Using Apache Spark MLlib on Databricks

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?