Featured resource
2025 Tech Upskilling Playbook
Tech Upskilling Playbook

Build future-ready tech teams and hit key business milestones with seven proven plays from industry leaders.

Check it out
  • Course
    • Libraries: If you want this course, consider one of these libraries.
    • Data

Scalable Machine Learning with PySpark MLlib

In this course, you’ll learn how to build scalable ML pipelines, perform large-scale feature engineering, and train models on massive datasets.

Warner Chaves - Pluralsight course - Scalable Machine Learning with PySpark MLlib
Warner Chaves
What you'll learn

PySpark MLlib powers distributed machine learning. In this course, Scalable Machine Learning with PySpark MLlib, you’ll gain the ability to leverage Apache Spark’s distributed computing framework for your machine learning workloads. First, you’ll explore the fundamentals of Spark MLlib and the Spark ML Pipeline API, learning how it differs from single‐machine solutions.

Next, you’ll discover how to perform feature engineering and build classification/regression models that can handle big datasets efficiently.

Finally, you’ll learn how to tune hyperparameters and optimize performance so that your pipelines can run smoothly and quickly.

When you’re finished with this course, you’ll have the skills and knowledge of PySpark MLlib needed to implement and scale out your own machine learning solutions on large datasets.

Table of contents

About the author
Warner Chaves - Pluralsight course - Scalable Machine Learning with PySpark MLlib
Warner Chaves

Warner is a SQL Server Certified Master, MVP, and Principal Consultant at Pythian. He manages clients in many industries and leads a talented team that maintains and innovates with their data solutions. When he's not working in Ottawa, Ontario, he can be found in his home country of Costa Rica.

Get access now

Sign up to get immediate access to this course plus thousands more you can watch anytime, anywhere.

Get started with Pluralsight