Course

Scalable Data Processing with Python

Learn to process large-scale data efficiently with Python. This course will teach you to leverage PySpark and Dask for scalable, parallel, and distributed data processing, and to optimize performance and handle real-world scaling challenges.

Intermediate

55m

(0)

Created by Yasir Khan

Last Updated Apr 09, 2025

Get started today

Access this course and other top-rated tech content with one of our business plans.

Start a free team trial

Buy now

Try this course for free

Access this course and other top-rated tech content with one of our individual plans.

Start a free trial

Buy now

This course is included in the libraries shown below:

Data

Course

Scalable Data Processing with Python

Intermediate

55m

(0)

Created by Yasir Khan

Last Updated Apr 09, 2025

Get started today

Access this course and other top-rated tech content with one of our business plans.

Start a free team trial

Buy now

Try this course for free

Access this course and other top-rated tech content with one of our individual plans.

Start a free trial

Buy now

This course is included in the libraries shown below:

Data

What you'll learn

Scalable data processing is essential for handling large datasets efficiently, yet many struggle with optimizing performance.

In this course, Scalable Data Processing with Python, you’ll gain the ability to process and manage large-scale data using PySpark and Dask.

First, you’ll explore the fundamentals of scalability, including parallel, distributed, and batch processing.

Next, you’ll discover how to use PySpark to process massive datasets with transformations, caching, and optimizations.

Finally, you’ll learn how to leverage Dask for parallel computation, optimizing execution with task graphs and lazy evaluation.

When you’re finished with this course, you’ll have the skills and knowledge to efficiently process large datasets and handle performance challenges in scalable data processing.

Scalable Data Processing with Python

Intermediate

55m

(0)

Table of contents

About the author

Yasir Khan

33 courses

0.0 author rating

0 ratings

Dr. Yasir Khan is a global tech consultant and 38Labs founder. He's passionate about digital transformation, data & AI, and regularly shares technology insights on Pluralsight.

More Courses by Yasir

Scalable Data Processing with Python

Scalable Data Processing with Python

Get started today

Try this course for free

Scalable Data Processing with Python

What you'll learn

Scalable Data Processing with Python

Fundamentals of Scalable Data Processing 14m

PySpark for Large-scale Data Processing 11m

Dask for Parallel Processing 15m

Handle Performance and Scaling 14m

2025 Forrester Wave™ names Pluralsight as a Leader among tech skills dev platforms