Course

Skills

Coping with Missing, Invalid, and Duplicate Data in R

by Martin Burger

Learn about the most essential steps of data preparation: Missing value imputation, outlier detection, and duplicate removal.

Preview this course

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(10)

Level

Intermediate

Updated

Oct 11, 2019

Duration

2h 1m

What you'll learn

Data preparation is part of nearly any data analytics project, therefore the skills are highly valuable. In this course, Coping with Missing, Invalid, and Duplicate Data in R, you will learn the main steps of data preparation. First, you will learn how to handle duplicate data. Next, you will discover that missing values prevent a lot of R functions from working properly, therefore you are limited in your R toolset as long as you do not take care of all these NA's. Finally, you will explore outlier and invalid data detection and how they can introduce bias into your analysis. When you’re finished with this course, you will understand why missing values, outliers, and duplicates are problematic, how to detect them, and how to remove them from the dataset.

Course Overview

1min

Course Overview 2m

Managing Duplicate Data

28mins

Managing Missing Data

37mins

Intro 2m
Understanding Missing Values 6m
Quick and Simple Methods for Missing Values 5m
Imputation Methods 4m
Using visdat for NA Visualizations 2m
MICE for Missing Values 3m
Machine Learning for Missing Values 8m
Working on the Carparts Dataset 6m
Summary 1m

Outlier and Invalid Data Detection

37mins

Intro 1m
Understanding Statistical Outliers 5m
Methods for Outlier Detection 5m
The 6 Sigma Rule 6m
The Boxplot Method 4m
Hypothesis Tests for Outliers 5m
Outliers in High Dimensionality 5m
Plausibility Checks and Replacement 5m
Summary 2m

Further Resources and Summary

14mins

Intro 1m
Reproducibility in Pseudo Random Processes 3m
Data Pre-processing Task Views 5m
Course Summary 5m

About the author

Martin Burger

Martin studied biostatistics and worked for several pharmaceutical companies before he became a data science consultant and author. He published over 15 courses on R, Tableau 9 and other data science related subjects. His main focus lies on analytics software like R and SPSS but he is also interested in modern data visualization tools like Tableau. If he is not busy coding, blogging or working out new teaching concepts you may find him skiing or hiking in the Alps.

See more courses by Martin Burger

Try for free

Get this course plus top-rated picks in tech skills and other popular topics.

$29.00

per month after 10 day trial

Your 10 day Standard free trial includes

Expert-led courses

Keep up with the pace of change with thousands of expert-led, in-depth courses.

For teams

Give up to 50 users access to our full library including this course free for 30 days

Course info

Rating

(10)

Level

Intermediate

Updated

Oct 11, 2019

Duration

2h 1m

Ready to upskill? Get started

Contact Sales

Coping with Missing, Invalid, and Duplicate Data in R

What you'll learn

Table of contents

About the author

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill up
your entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Contact Sales

Coping with Missing, Invalid, and Duplicate Data in R

What you'll learn

Table of contents

About the author

Get access now

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Ready to skill upyour entire team?

With your Pluralsight plan, you can:

With your 30-day pilot, you can:

Support

Community

Company

Industries

Newsletter

Ready to skill up
your entire team?

Ready to skill up
your entire team?