Apache Spark is one of the fastest and most efficient general engines for large-scale data processing. In this course, you'll learn how to develop Spark applications for your Big Data using Scala and a stable Hadoop distribution, Cloudera CDH.
At the core of working with large-scale datasets is a thorough knowledge of Big Data platforms like Apache Spark and Hadoop. In this course, Developing Spark Applications Using Scala & Cloudera, you’ll learn how to process data at scales you previously thought were out of your reach. First, you’ll learn all the technical details of how Spark works. Next, you’ll explore the RDD API, the original core abstraction of Spark. Then, you’ll discover how to become more proficient using Spark SQL and DataFrames. Finally, you'll learn to work with Spark's typed API: Datasets. When you’re finished with this course, you’ll have a foundational knowledge of Apache Spark with Scala and Cloudera that will help you as you move forward to develop large-scale data applications that enable you to work with Big Data in an efficient and performant way.
Xavier is very passionate about teaching, helping others understand search and Big Data. He is also an entrepreneur, project manager, technical author, trainer, and holds a few certifications with Cloudera, Microsoft, and the Scrum Alliance, along with being a Microsoft MVP.
Course Overview Hello and welcome to this Pluralsight course, Developing Spark Applications Using Scala and Cloudera. I am Xavier Morera and I help developers understand enterprise search and big data. Did you know that Spark as a big data processing engine is at least 10 to 100 times faster than Hadoop MapReduce? And on top of that it is easier to learn, widely adopted, and used for a diverse range of applications? In this course we're going to learn how to create Spark applications in the very language in which Spark was created; Scala, a language in which most Spark samples are available. Also, because infrastructure is important, we will leverage the first and one of the most widely used Hadoop distributions, CDH, which stands for Cloudera's Distribution including Hadoop. Some of the major topics that we will cover include getting an environment set up with Spark and some interesting data, namely CDH plus StackOverflow, understanding Spark and overview, and getting technical with Spark. We will also learn how to work with RDDs or resilient distributed datasets, the original core abstraction of Spark, and then we will cover DataFrames and Spark SQL, which helps us become proficient with Spark quicker, and finally we will learn about datasets, the higher-level typed API in Spark. By the end of this course you will be able to create Spark applications using Scala and Cloudera. Before beginning the course, you should be familiar with programming, preferably with Scala, but I include a refresher module to jumpstart your journey. Additionally, you will need a cluster, but I will explain how to get your infrastructure set up in multiple different ways. I hope you will join me on this journey to learn about Spark with the Developing Spark Applications with Scala and Cloudera course at Pluralsight.