Handling Fast Data with Apache Spark SQL and Streaming

Apache Spark is a leader in enabling quick and efficient data processing. This course will teach you how to use Spark's SQL, Streaming, and even the newer Structured Streaming APIs to create applications able to handle data as it arrives.
Course info
Rating
(25)
Level
Intermediate
Updated
Aug 4, 2017
Duration
4h 35m
Table of contents
Course Overview
Introduction
Querying Data with the DataFrames (Part 1)
Querying Data with the DataFrames (Part 2)
Improving Type Safety with Datasets
Processing Data with the Streaming API
Optimizing, Structured Streaming, and Spark 2.x
Description
Course info
Rating
(25)
Level
Intermediate
Updated
Aug 4, 2017
Duration
4h 35m
Description

Analyzing data used to be something you did once a night. Now you need to be able to process data on the fly so you can provide up to the minute insights. But, how do you accomplish in real time what used to take hours without a complicated code base? In this course, Handling Fast Data with Apache Spark SQL and Streaming, you'll learn to use Apache Spark Streaming and SQL libraries as a great way to handle this new world of real time, fast data processing. First, you'll dive into SparkSQL. Next, you'll explore how to catch potential fraud by analyzing streams with Spark Streaming. Finally, you'll discover the newer Structured Streaming API. By the end of this course, you'll have a deeper understanding of these APIs, along with a number of streaming concepts that have driven the API design.

About the author
About the author

Justin is a software journeyman, continuously learning and honing his skills.

More from the author
Apache Spark Fundamentals
Intermediate
4h 27m
Oct 27, 2015
More courses by Justin Pihony
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi, my name is Justin Pihony, and welcome to my course, Fast Data Handling with Apache Spark SQL and Streaming. Being a top contributor of Apache Spark answers on Stack Overflow, as well as the developer support manager at Lightbend has given me a lot of insight into how to maximize Spark's power, while sidestepping possible pitfalls. Fast data is the next big thing in the world of data. Nowadays, we want valuable business insights now, not after having to wait for batch jobs to complete, and we're now at a point where we can build these systems able to reactively handle our needs at scale. In this course, we're going to see how to use Spark in its SQL and streaming capabilities to build these fast data applications without breaking a sweat. Some of the major topics that we'll cover include a deep dive into Spark's SQL library, learning both the untyped side via DataFrames and the type-safe side via datasets, as well as covering Spark's take on streaming via both the older, more-stable Spark Streaming library and its modernized, up-and-coming structured streaming library. By the end of this course, you'll have extensive knowledge of Spark's SQL and streaming APIs, knowing how to utilize them to create a fast data application capable of pulling out business insights in no time at all. Before beginning the course, you should have a basic understanding of Apache Spark, which you can get from my other course, Apache Spark Fundamentals. I hope you'll join me on this journey to learn about Spark's SQL and streaming libraries, and how they can be used in this new architecture overtaking the big data world via the Fast Data Handling with Apache Spark SQL and Streaming at Pluralsight.