Course

Skills

Building Machine Learning Models in Spark 2

by Janani Ravi

Training ML models is a compute-intensive operation and is best done in a distributed environment. This course will teach you how Spark can efficiently perform data explorations, cleaning, aggregations, and train ML models all on one platform.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(37)

Level

Intermediate

Updated

Jun 19, 2018

Duration

3h 27m

What you'll learn

Spark is possibly the most popular engine for big data processing these days. In this course, Building Machine Learning Models in Spark 2, you will learn to build and train Machine Learning (ML) models such as regression, classification, clustering, and recommendation systems on Spark 2.x's distributed processing environment.

This course starts off with an introduction of the 2 ML libraries available in Spark 2; the older spark.mllib library built on top of RDDs and the newer spark.ml library built on top of dataframes. You will get to see the two compared to help you know when to pick one over the other.

You will get to see a classification model built using Decision Trees the old way, and see how you can implement the same model on the newer spark.ml library.

The course covers many features of Spark 2, including going over a brand new feature in Spark 2, the ML pipelines used to chain your data transformations and ML operations.

At the end of this course you will be comfortable using the advanced features that Spark 2 offers for machine learning. You'll learn to use components such as Transformers, Estimators, and Parameters within your ML pipelines to work with distributed training at scale.

Course Overview

2mins

Course Overview 2m

Machine Learning Packages: spark.mllib vs. spark.ml

48mins

Building Classification and Regression Models in Spark ML

70mins

Module Overview 1m
ML Pipelines, Estimators, and Transformers 7m
Training and Prediction Pipeline Stages 3m
Feature Engineering 2m
Feature Extractors 4m
Feature Transformers 4m
Feature Selectors and Locality Sensitive Hashing 1m
The Confusion Matrix: Accuracy, Precision, Recall, F1 Score 6m
Demo: Wine Classification Using Decision Trees in Spark ML 3m
Demo: Converting Categorical Data to Numeric Values 2m
Demo: The Decision Tree Classifier 2m
Random Forests 4m
Demo: Income Classification Using Random Forests 4m
Demo: Using ML Pipelines 6m
Demo: Predictions Using the Random Forest 2m
Introducing Regularized Regression Models to Prevent Overfitting 5m
Lasso and Ridge Regression 3m
Demo: Linear Regression with the Elastic Net Param 4m
Demo: Predictions Using the Regression Model 3m
Demo: Hyperparameter Tuning 4m

Implementing Clustering and Dimensionality Reduction in Spark ML

46mins

Module Overview 1m
Supervised and Unsupervised Learning Techniques 5m
Clustering Objectives 3m
Visualizing K-means Clustering 2m
Number of Clusters as a Hyperparameter: The Elbow and Silhouette Method 8m
Demo: K-means Clustering on the Titanic Dataset 6m
Demo: Exploring Clusters 5m
Principal Component Analysis: Intuition 4m
Demo: Regression Model Without PCA 6m
Demo: Performing Regression on Principal Components 6m

Building Recommendation Systems in Spark ML

39mins

Module Overview 1m
Content-based and Collaborative Filtering 5m
Estimating the Ratings Matrix 8m
The Alternating Least Squares Method 2m
Explicit and Implicit Ratings 6m
Cold Start Strategies and Compute Intensity 2m
Demo: Building a Recommendation System Using Explicit Ratings 4m
Demo: Getting Movie Recommendations for Specific Users 4m
Demo: Building a Recommendation System Using Implicit Ratings 4m
Demo: Getting Artist Recommendations for Specific Users 3m
Summary and Further Study 2m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(37)

Level

Intermediate

Updated

Jun 19, 2018

Duration

3h 27m

Ready to upskill? Get started

Contact Sales

Building Machine Learning Models in Spark 2

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Building Machine Learning Models in Spark 2

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?