Course

Skills

Getting Started with Apache Spark on Databricks

by Janani Ravi

This course will introduce you to analytical queries and big data processing using Apache Spark on Azure Databricks. You will learn how to work with Spark transformations, actions, visualizations, and functions using the Databricks Runtime.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(65)

Level

Beginner

Updated

Oct 25, 2021

Duration

1h 52m

What you'll learn

Azure Databricks allows you to work with big data processing and queries using the Apache Spark unified analytics engine. With Azure Databricks you can set up your Apache Spark environment in minutes, autoscale your processing, and collaborate and share projects in an interactive workspace.

In this course, Getting Started with Apache Spark on Databricks, you will learn the components of the Apache Spark analytics engine which allows you to process batch as well as streaming data using a unified API. First, you will learn how the Spark architecture is configured for big data processing, you will then learn how the Databricks Runtime on Azure makes it very easy to work with Apache Spark on the Azure Cloud Platform and will explore the basic concepts and terminology for the technologies used in Azure Databricks.

Next, you will learn the workings and nuances of Resilient Distributed Datasets also known as RDDs which is the core data structure used for big data processing in Apache Spark. You will see that RDDs are the data structures on top of which Spark Data frames are built. You will study the two types of operations that can be performed on Data frames - namely transformations and actions and understand the difference between them. You’ll also learn how Databricks allows you to explore and visualize your data using the display() function that leverages native Python libraries for visualizations.

Finally, you will get hands-on experience with big data processing operations such as projection, filtering, and aggregation operations. Along the way, you will learn how you can read data from an external source such as Azure Cloud Storage and how you can use built-in functions in Apache Spark to transform your data.

When you are finished with this course you will have the skills and ability to work with basic transformations, visualizations, and aggregations using Apache Spark on Azure Databricks.

Course Overview

2mins

Course Overview 2m

Overview of Apache Spark on Databricks

34mins

Transformations, Actions, and Visualizations

41mins

RDDs and Data Frames 7m
Spark APIs 2m
Demo: dbutils 3m
Demo: Transformations and Actions on RDDs 5m
Demo: Transformations and Actions on Data Frames 3m
Demo: Uploading a Dataset to DBFS Using Notebooks 4m
Demo: Basic Selection and Filtering Operations 4m
Demo: Writing CSV Files out to DBFS 4m
Demo: Creating a Table Using the Databricks UI 2m
Demo: Visualizing Data Using the Display Command 3m
Demo: Exploring Databricks Visualizations 5m

Modify Data Using Spark Functions

34mins

Demo: Reading and Parsing JSON Data 6m
Demo: Accessing Nested Fields and List Elements 5m
Demo: Setting up an Azure Storage Account 3m
Demo: Storing Secrets in the Azure Key Vault 2m
Demo: Reading from Azure Data Storage 6m
Demo: Basic SQL Transformations 5m
Demo: Built-in Functions 6m
Summary and Next Steps 1m

About the author

Janani Ravi

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing ... more

See more courses by Janani Ravi

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(65)

Level

Beginner

Updated

Oct 25, 2021

Duration

1h 52m

Ready to upskill? Get started

Contact Sales

Getting Started with Apache Spark on Databricks

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Getting Started with Apache Spark on Databricks

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?