Featured resource
2026 Tech Forecast
2026 Tech Forecast

Stay ahead of what’s next in tech with predictions from 1,500+ business leaders, insiders, and Pluralsight Authors.

Get these insights
  • Course

Getting Started with Spark 2

The 2.x releases of Spark represent significantly different and upgraded features. This course will focus on all of these changes, in both theory and practice.

Beginner
2h 16m
(173)

Created by Janani Ravi

Last Updated Feb 28, 2025

Course Thumbnail
  • Course

Getting Started with Spark 2

The 2.x releases of Spark represent significantly different and upgraded features. This course will focus on all of these changes, in both theory and practice.

Beginner
2h 16m
(173)

Created by Janani Ravi

Last Updated Feb 28, 2025

Get started today

Access this course and other top-rated tech content with one of our business plans.

Try this course for free

Access this course and other top-rated tech content with one of our individual plans.

This course is included in the libraries shown below:

  • Data
What you'll learn

Spark is possibly the most popular engine for big data processing these days and the 2.x release has several new features which make Spark more powerful and easy to work with. In this course, Getting Started with Spark 2, you will get up and running with Spark 2 and understand the similarities and differences between version 2.x and older versions. First, you will get to see the basic Spark architecture and the details of Project Tungsten which brought great performance improvements to Spark 2. You will go over the new developer APIs using DataFrames and see how they inter-operate with RDDs from Spark 1.x. Next, you will move on to big data processing where you will load and clean datasets, remove invalid rows, execute transformations to extract insights and perform grouping, sorting, and aggregations using the new DataFrame APIs. You will also study how and where to use broadcast variables and accummulators. Finally, you will work with Spark SQL which allows you to use SQL commands for big data processing. The course also covers advanced SQL support in the form of windowing operations. At the end of this course, you should be very comfortable working with Spark DataFrames and Spark SQL. You will be better equipped to make technical choices based on the performance trade-offs of older versions of Spark vs. Spark 2. Software required: Apache Spark 2.2, Python 2.7.

Getting Started with Spark 2
Beginner
2h 16m
(173)
Table of contents

About the author
Janani Ravi - Pluralsight course - Getting Started with Spark 2
Janani Ravi
192 courses 4.5 author rating 6281 ratings

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

Get started with Pluralsight