Developing Spark Applications with Python & Cloudera

Apache Spark is one of the fastest and most efficient general engines for large-scale data processing. In this course, you will learn how to develop Spark applications for your Big Data using Python and a stable Hadoop distribution, Cloudera CDH.
Course info
Rating
(25)
Level
Beginner
Updated
Feb 27, 2018
Duration
5h 42m
Table of contents
Course Overview
Why Spark with Python and Cloudera?
Getting an Environment & Data: CDH + StackOverflow
Refreshing Your Knowledge: Python Fundamentals for This Course
Understanding Spark: An Overview
Getting Technical with Spark
Learning the Core of Spark: RDDs
Going Deeper into Spark Core
Increasing Proficiency with Spark: DataFrames & Spark SQL
Continuing the Journey on DataFrames and Spark SQL
Understanding a Typed API: Datasets Works with Scala, Not Python
Final Takeaway and Continuing the Journey with Spark
Description
Course info
Rating
(25)
Level
Beginner
Updated
Feb 27, 2018
Duration
5h 42m
Description

At the core of working with large-scale datasets is a thorough knowledge of Big Data platforms like Apache Spark and Hadoop. In this course, Developing Spark Applications with Python & Cloudera, you’ll learn how to process data at scales you previously thought were out of your reach. First, you’ll learn all the technical details of how Spark works. Next, you’ll explore the RDD API, the original core abstraction of Spark. Finally, you’ll discover how to become more proficient using Spark SQL and DataFrames. When you’re finished with this course, you’ll have a foundational knowledge of Apache Spark with Python and Cloudera that will help you as you move forward to develop large-scale data applications that enable you to work with Big Data in an efficient and performant way.

About the author
About the author

Xavier is very passionate about teaching, helping others understand search and Big Data. He is also an entrepreneur, project manager, technical author, trainer, and holds a few certifications with Cloudera, Microsoft, and the Scrum Alliance, along with being a Microsoft MVP.

More from the author
T-SQL Data Manipulation Playbook
Intermediate
2h 54m
Sep 27, 2019
Programming Python Using an IDE
Intermediate
2h 0m
Jun 26, 2019
More courses by Xavier Morera
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hello, and welcome to this Pluralsight course, Developing Spark Applications with Python and Cloudera. I am Xavier Morera and I help developers understand Enterprise Search and Big Data. Did you know that Spark as a big data processing engine is at least 10 to 100 times faster than Hadoop or MapReduce and that on top of that, it is easier to learn, widely adopted, and used for a diverse range of applications? In this course we're going to learn how to create Spark applications in a very popular and also easy-to-use language, Python, and because infrastructure is important, we will leverage the first and one of the most widely used Hadoop distributions, CDH, which stands for Cloudera's Distribution including Hadoop. Some of the major topics that we will cover include getting an environment set up with Spark and some interesting data, namely CDH plus StackOverflow, understanding Spark, an overview and getting technical with Spark. Then we will learn how to work with the original core abstraction of Spark, the RDD or resilient distributed dataset. Next we will cover data frames and Spark SQL, which helps us become more proficient with Spark quicker. And finally we will talk at a high level about datasets, which are not to be used with Python because it's dynamically typed and we'll also cover a few related topics. By the end of this course you will be able to create Spark applications with Python and Cloudera, but before beginning the course you should be familiar with programming, preferably with Python, but I include also a small refresher module in case you need a jumpstart. Additionally, you will need a cluster, but I will explain how to get your infrastructure set up in multiple different ways. I hope you will join me on this journey to learn about Spark with the Developing Spark Applications with Python and Cloudera course at Pluralsight.