Course

Skills

Beginning Data Exploration and Analysis with Apache Spark

by Swetha Kolalapudi

80% of a data scientist's job is data preparation. This course is all about data preparation i.e. cleaning, transforming, summarizing data using Spark.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(125)

Level

Beginner

Updated

Jul 22, 2024

Duration

1h 57m

What you'll learn

Data preparation is a staple task for any data professional, whether you just want to explore data or develop sophisticated Machine Learning models. Spark is an engine that helps do this in a very intuitive way, using functional constructs that abstract the user from all the messiness of working with large datasets. In this course, Beginning Data Exploration and Analysis with Apache Spark, you'll go through exploratory data analysis and data munging with Spark, step-by-step. First, you'll explore RDDs and functional constructs that make processing in Spark extremely intuitive. Next, you'll discover how to transform and clean unstructured data. Finally, you'll learn how to summarize data along dimensions and how to model relationships to build co-occurrence networks. By the end of this course, you'll be able to use Spark to transform data in any way that you would like.

Course Overview

1min

Course Overview 2m

Getting Started with Spark's Resilient Distributed Datasets

27mins

Transforming and Cleaning Unstructured Data

32mins

Analyzing Crime in New York City 5m
Programming in the Functional Paradigm 4m
Applying Functional Constructs to Transform Datasets 5m
Filtering Rows 2m
Transforming Records to Extract Fields 4m
Identifying and Filtering Missing Values 4m
Identifying and Filtering Anomalies 5m
Summarizing and Visualizing Crime in NYC 5m

Summarizing Data Along Dimensions

30mins

Representing Data Using Pair RDDs 5m
Creating a Pair RDD 3m
Summarizing Pair RDDs 3m
Computing a Daily Trend 3m
Merging Pair RDDs 3m
Adding a Dimension to an RDD 4m
Computing Averages with Pair RDDs 6m
Comparing Daily Averages 2m

Modeling Relationships in the Marvel Social Universe

25mins

Representing Datasets as Networks 5m
Finding the Most Influential Characters 5m
Building a Co-occurrence Network 8m
Finding the Most Important Relationships 8m

About the author

Swetha Kolalapudi

Swetha loves playing with data and crunching numbers to get cool insights. She is an alumnus of top schools like IIT Madras and IIM Ahmedabad. She was the first member of Flipkart’s elite Analytics team and was instrumental in scaling it to 100+ employees. Swetha has always had an entrepreneurial bent and a love for teaching. She now has the chance to do both as the co¬founder of Loonycorn, a content studio focused on providing high quality content for technical skill development. Loonycorn ... more

See more courses by Swetha Kolalapudi

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(125)

Level

Beginner

Updated

Jul 22, 2024

Duration

1h 57m

Ready to upskill? Get started

Contact Sales

Beginning Data Exploration and Analysis with Apache Spark

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Beginning Data Exploration and Analysis with Apache Spark

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?