Featured resource
2026 Tech Forecast
2026 Tech Forecast

Stay ahead of what’s next in tech with predictions from 1,500+ business leaders, insiders, and Pluralsight Authors.

Get these insights
  • Course

Apache Spark 3 Fundamentals

Learn the Fundamentals of Apache Spark 3: process data, set up the environment, use RDDs & DataFrames, optimize apps, build pipelines with Databricks and Azure Synapse. Familiarize yourself with Spark's ecosystem here in this course.

Beginner
6h 19m
(114)

Created by Mohit Batra

Last Updated Mar 23, 2023

Course Thumbnail
  • Course

Apache Spark 3 Fundamentals

Learn the Fundamentals of Apache Spark 3: process data, set up the environment, use RDDs & DataFrames, optimize apps, build pipelines with Databricks and Azure Synapse. Familiarize yourself with Spark's ecosystem here in this course.

Beginner
6h 19m
(114)

Created by Mohit Batra

Last Updated Mar 23, 2023

Get started today

Access this course and other top-rated tech content with one of our business plans.

Try this course for free

Access this course and other top-rated tech content with one of our individual plans.

This course is included in the libraries shown below:

  • Core Tech
What you'll learn

Apache Spark is one of the most widely used analytics engines. It performs distributed data processing and can handle petabytes of data. Spark can work with a variety of data formats, process data at high speeds, and support multiple use cases. Version 3 of Spark brings a whole new set of features and optimizations. In this course, Apache Spark 3 Fundamentals, you'll learn how Apache Spark can be used to process large volumes of data, whether batch or streaming data, and about the growing ecosystem of Spark. First, you'll learn what Apache Spark is, its architecture, and its execution model. You'll then see how to set up the Spark environment. Next, you'll learn about two Spark APIs – RDDs and DataFrames – and see how to use them to extract, analyze, clean, and transform batch data. Then, you'll learn various techniques to optimize your Spark applications, as well as the new optimization features of Apache Spark 3. After that, you'll see how to reliably store data in a Data Lake using the Delta Lake format and build streaming pipelines with Spark. Finally, you'll see how to use Spark in cloud services like Databricks and Azure Synapse Analytics. By the end of this course, you'll have the knowledge and skills to work with Apache Spark and use its capabilities and ecosystem to build large-scale data processing pipelines. So, let's get started!

Apache Spark 3 Fundamentals
Beginner
6h 19m
(114)
Table of contents

About the author
Mohit Batra - Pluralsight course - Apache Spark 3 Fundamentals
Mohit Batra
14 courses 4.7 author rating 871 ratings

Mohit is a Data Engineer, a Microsoft Certified Trainer (MCT) and a consultant. Mohit has 15+ years of extensive experience in architecting large scale Business Intelligence, Data Warehousing and Big Data solutions with companies like Microsoft and some leading investment banks.

Get started with Pluralsight