Course

Skills

Getting Started with Spark 2

by Janani Ravi

The 2.x releases of Spark represent significantly different and upgraded features. This course will focus on all of these changes, in both theory and practice.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(170)

Level

Beginner

Updated

May 16, 2018

Duration

2h 16m

What you'll learn

Spark is possibly the most popular engine for big data processing these days and the 2.x release has several new features which make Spark more powerful and easy to work with. In this course, Getting Started with Spark 2, you will get up and running with Spark 2 and understand the similarities and differences between version 2.x and older versions. First, you will get to see the basic Spark architecture and the details of Project Tungsten which brought great performance improvements to Spark 2. You will go over the new developer APIs using DataFrames and see how they inter-operate with RDDs from Spark 1.x. Next, you will move on to big data processing where you will load and clean datasets, remove invalid rows, execute transformations to extract insights and perform grouping, sorting, and aggregations using the new DataFrame APIs. You will also study how and where to use broadcast variables and accummulators. Finally, you will work with Spark SQL which allows you to use SQL commands for big data processing. The course also covers advanced SQL support in the form of windowing operations. At the end of this course, you should be very comfortable working with Spark DataFrames and Spark SQL. You will be better equipped to make technical choices based on the performance trade-offs of older versions of Spark vs. Spark 2. Software required: Apache Spark 2.2, Python 2.7.

Course Overview

1min

Course Overview 2m

Understanding Differences Between Spark 2.x and Spark 1.x

59mins

Exploring and Analyzing Data with DataFrames

47mins

Module Overview 1m
Introducing the Spark Session 1m
Demo: Exploring the London Crime Dataset 5m
Demo: Grouping, Aggregating, and Ordering Data 5m
Demo: Aggregations and Visualizations 4m
Broadcast Variables and Accumulators 9m
Demo: UDFs to Extract Information About Soccer Players 5m
Demo: Working with Joins in DataFrames 5m
Demo: Using Broadcast Variables 2m
Demo: Working with Accumulators 5m
Demo: Saving DataFrames as CSV and JSON Files 3m
Demo: Using Custom Accumulators 2m
Demo: Other Join Operations 3m

Querying Data Using Spark SQL

27mins

Module Overview 1m
Demo: Basic Spark SQL Operations 3m
Demo: Using Spark SQL to Analyze Airline Data 6m
The Catalyst Optimizer 3m
Demo: Inferred and Explicit Schemas 3m
How Do Window Functions Work? 5m
Demo: Window Functions 4m
Summary and Further Learning 1m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(170)

Level

Beginner

Updated

May 16, 2018

Duration

2h 16m

Ready to upskill? Get started

Contact Sales

Getting Started with Spark 2

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Getting Started with Spark 2

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?