Course

Skills

Handling Fast Data with Apache Spark SQL and Streaming

by Justin Pihony

Apache Spark is a leader in enabling quick and efficient data processing. This course will teach you how to use Spark's SQL, Streaming, and even the newer Structured Streaming APIs to create applications able to handle data as it arrives.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(38)

Level

Intermediate

Updated

Aug 4, 2017

Duration

4h 34m

What you'll learn

Analyzing data used to be something you did once a night. Now you need to be able to process data on the fly so you can provide up to the minute insights. But, how do you accomplish in real time what used to take hours without a complicated code base? In this course, Handling Fast Data with Apache Spark SQL and Streaming, you'll learn to use Apache Spark Streaming and SQL libraries as a great way to handle this new world of real time, fast data processing. First, you'll dive into SparkSQL. Next, you'll explore how to catch potential fraud by analyzing streams with Spark Streaming. Finally, you'll discover the newer Structured Streaming API. By the end of this course, you'll have a deeper understanding of these APIs, along with a number of streaming concepts that have driven the API design.

Course Overview

2mins

Course Overview 2m

Introduction

21mins

Querying Data with the DataFrames (Part 1)

43mins

Introduction 1m
Spark SQL FTW 5m
Digging into the DataFrame API 11m
Reviewing the Rest of the DataFrame API 8m
Querying with SQL 2m
Extending Querying via Functions 11m
Flattening Data with Explode 5m
Working with Windows 1m

Querying Data with the DataFrames (Part 2)

41mins

Introduction 1m
Working with Windows 10m
Functions, Functions... 7m
And More Functions 6m
Still Not Enough Functions? User Defined Functions 5m
Understanding Joins 8m
Advanced SQL Monitoring with the Spark UI 2m
Resources 1m
Summary 1m

Improving Type Safety with Datasets

41mins

Introduction 1m
Why Not Just DataFrames? 6m
Adding Type Safety with Datasets 10m
Even Aggregation Can Be Type Safe 5m
Digging into Datasets 2m
Joins with Datasets 2m
Beyond Native Datasources with Cassandra 8m
More Datasources 4m
Resources 2m
Summary 1m

Processing Data with the Streaming API

67mins

Introduction 2m
The Streaming Landscape 5m
Introducing Kafka 10m
Understanding Spark Streaming's Mechanics 4m
Streaming in Action 8m
More of the Streaming API 4m
The DStream 'RDD' API 4m
About Stateful Streaming: Windows and Checkpoints 7m
Utilizing State for Speedy Fraud Detection 12m
An Improved Stateful Stream via mapWithState 7m
The Streaming UI 2m
Resources 2m
Summary 1m

Optimizing, Structured Streaming, and Spark 2.x

58mins

Introduction 1m
Increasing Stream Resiliency 7m
Optimizing to Boost Performance: Streaming 5m
Optimizing to Boost Performance: SQL 4m
Introduction to Structured Streaming 5m
A Deeper Dive into Structured Streaming 8m
Structured Streaming: Watermarks and Output Models 8m
Structured Streaming Demo 11m
The Future: Spark 2.x 8m
Resources 2m
Summary 1m

About the author

Justin Pihony

Justin is a software journeyman, continuously learning and honing his skills. Most of his early professional career was spent in C# and MSSQL, but he loves learning about many different languages, especially Scala. This passion for Scala led him to join the Lightbend (formerly Typesafe) team, diving even deeper into the Scala ecosphere. And, as much as he loves to learn, he also loves to spread his knowledge through teaching and helping others. He is a very active answerer on StackOverflow... more

See more courses by Justin Pihony

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(38)

Level

Intermediate

Updated

Aug 4, 2017

Duration

4h 34m

Ready to upskill? Get started

Contact Sales

Handling Fast Data with Apache Spark SQL and Streaming

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Handling Fast Data with Apache Spark SQL and Streaming

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?