Course

Skills

Apache Spark Fundamentals

This course will teach you how to use Apache Spark to analyze your big data at lightning-fast speeds; leaving Hadoop in the dust! For a deep dive on SQL and Streaming check out the sequel, Handling Fast Data with Apache Spark SQL and Streaming.

Preview this course

What you'll learn

Our ever-connected world is creating data faster than Moore's law can keep up, making it so that we have to be smarter in our decisions on how to analyze it. Previously, we had Hadoop's MapReduce framework for batch processing, but modern big data processing demands have outgrown this framework. That's where Apache Spark steps in, boasting speeds 10-100x faster than Hadoop and setting the world record in large scale sorting. Spark's general abstraction means it can expand beyond simple batch processing, making it capable of such things as blazing-fast, iterative algorithms and exactly once streaming semantics. In this course, you'll learn Spark from the ground up, starting with its history before creating a Wikipedia analysis application as one of the means for learning a wide scope of its core API. That core knowledge will make it easier to look into Spark's other libraries, such as the streaming and SQL APIs. Finally, you'll learn how to avoid a few commonly encountered rough edges of Spark. You will leave this course with a tool belt capable of creating your own performance-maximized Spark application.

Getting Started

39mins

Spark Core: Part 1

55mins

Spark Core: Part 2

28mins

Intro 1m
Implicit Conversions 4m
Key Value Methods 9m
Caching Data 6m
Accumulating Data 4m
Java in Spark 3m
Resources 1m
Summary 1m

Distribution and Instrumentation

47mins

Intro 1m
Spark Submit 8m
Cluster Management 8m
Standalone Cluster Scripts 4m
AWS Setup 5m
Spark on Yarn in EMR 8m
Spark UI 11m
Resources 2m
Summary 1m

Spark Libraries

63mins

Intro 2m
Spark SQL 10m
Spark SQL Demo 13m
Spark SQL Demo - The SQL Side 2m
Streaming 5m
Streaming Demo 10m
Machine Learning 4m
Machine Learning Demo 6m
GraphX 4m
GraphX Demo 4m
Resources 3m
Summary 2m

Optimizations and the Future

21mins

Intro 1m
Closures 6m
Broadcasting 5m
Optimizing Partitioning 5m
Spark's Future 4m
Resources 1m
Summary 1m

About the author

Justin Pihony

Justin is a software journeyman, continuously learning and honing his skills. Most of his early professional career was spent in C# and MSSQL, but he loves learning about many different languages, especially Scala. This passion for Scala led him to join the Lightbend (formerly Typesafe) team, diving even deeper into the Scala ecosphere. And, as much as he loves to learn, he also loves to spread his knowledge through teaching and helping others. He is a very active answerer on StackOverflow... more

See more courses by Justin Pihony

Ready to upskill? Get started

Contact Sales

Apache Spark Fundamentals

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Apache Spark Fundamentals

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?