Course

Scalable Machine Learning with PySpark MLlib

In this course, you’ll learn how to build scalable ML pipelines, perform large-scale feature engineering, and train models on massive datasets.

Intermediate

1h 9m

(1)

Created by Warner Chaves

Last Updated Apr 25, 2025

Get started today

Access this course and other top-rated tech content with one of our business plans.

Start a free team trial

Buy now

Try this course for free

Access this course and other top-rated tech content with one of our individual plans.

Start a free trial

Buy now

This course is included in the libraries shown below:

Data

Course

Scalable Machine Learning with PySpark MLlib

In this course, you’ll learn how to build scalable ML pipelines, perform large-scale feature engineering, and train models on massive datasets.

Intermediate

1h 9m

(1)

Created by Warner Chaves

Last Updated Apr 25, 2025

Get started today

Access this course and other top-rated tech content with one of our business plans.

Start a free team trial

Buy now

Try this course for free

Access this course and other top-rated tech content with one of our individual plans.

Start a free trial

Buy now

This course is included in the libraries shown below:

Data

What you'll learn

PySpark MLlib powers distributed machine learning. In this course, Scalable Machine Learning with PySpark MLlib, you’ll gain the ability to leverage Apache Spark’s distributed computing framework for your machine learning workloads. First, you’ll explore the fundamentals of Spark MLlib and the Spark ML Pipeline API, learning how it differs from single‐machine solutions.

Next, you’ll discover how to perform feature engineering and build classification/regression models that can handle big datasets efficiently.

Finally, you’ll learn how to tune hyperparameters and optimize performance so that your pipelines can run smoothly and quickly.

When you’re finished with this course, you’ll have the skills and knowledge of PySpark MLlib needed to implement and scale out your own machine learning solutions on large datasets.

Scalable Machine Learning with PySpark MLlib

Intermediate

1h 9m

(1)

Table of contents

About the author

Warner Chaves

38 courses

4.6 author rating

708 ratings

Warner is a SQL Server Certified Master, MVP, and Principal Consultant at Pythian. He manages clients in many industries and leads a talented team that maintains and innovates with their data solutions. When he's not working in Ottawa, Ontario, he can be found in his home country of Costa Rica.

More Courses by Warner

Scalable Machine Learning with PySpark MLlib

Scalable Machine Learning with PySpark MLlib

Get started today

Try this course for free

Scalable Machine Learning with PySpark MLlib

What you'll learn

Scalable Machine Learning with PySpark MLlib

Understanding PySpark MLlib 11m

Large-scale Feature Engineering 19m

Training and Evaluating ML Models 21m

Building a PySpark MLlib Pipeline 17m

2025 Forrester Wave™ names Pluralsight as a Leader among tech skills dev platforms