Work with RDDs, DataFrames, and Datasets in Apache Spark

Course

Libraries: If you want this course, consider one of these libraries.
Data

Work with RDDs, DataFrames, and Datasets in Apache Spark

Understanding RDDs in Apache Spark allows practitioners to understand how data is represented in the platform. Learn the fundamental differences between DataFrames and higher level APIs when dealing with datasets and transformations.

Raphael Alampay

Get started

What you'll learn

RDDs and their immutability properties are needed in order to understand why these building blocks are used when processing large amounts of data in a parallel processing environment.

In this course, Work with RDDs, DataFrames, and Datasets in Apache Spark, you’ll learn the difference between RDDs and DataFrames, when to use each one when representing data, and how they are processed underneath the hood with Apache Spark.

You'll understand how these work, which will help you gain a better grasp on how big data processing is done in a platform such as Apache Spark and what it means for efficiency when transforming data to something meaningful.

When you’re finished with this course, you’ll have a better understanding of how RDDs represent data in Apache Spark and when to use DataFrames over them when doing big data processing.

About the author

Raphael Alampay

Developer. Entrepreneur. Pianist. Guitarist. Raphael has a passion for bringing software to the masses and equipping people with the right mindset in using programming to solve real world problems. Aside from programming and teaching, Raphael does a lot of research and development in the academe in the field of computer science, specifically machine learning.

More Courses by Raphael