Description
Course info
Rating
(10)
Level
Intermediate
Updated
Oct 11, 2019
Duration
2h 48s
Description

Data preparation is part of nearly any data analytics project, therefore the skills are highly valuable. In this course, Coping with Missing, Invalid, and Duplicate Data in R, you will learn the main steps of data preparation. First, you will learn how to handle duplicate data. Next, you will discover that missing values prevent a lot of R functions from working properly, therefore you are limited in your R toolset as long as you do not take care of all these NA's. Finally, you will explore outlier and invalid data detection and how they can introduce bias into your analysis. When you’re finished with this course, you will understand why missing values, outliers, and duplicates are problematic, how to detect them, and how to remove them from the dataset.

About the author
About the author

Martin is a trained biostatistician, programmer, consultant and data science enthusiast. His main objective: Explaining data science in a straightforward way. You can find his latest work over at: r-tutorials.com

More from the author
More courses by Martin Burger
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Welcome to Coping with Missing, Invalid, and Duplicate Data in R, and this is Martin Burger for Pluralsight. In this intermediate level course, we're looking at three very important parts of data preprocessing, the handing of duplicate data, missing value imputation, as well as outlier and invalid data detection. These processes are crucial for the initial preparation of your data. Missing values prevent a lot of our functions to work properly, therefore you are limited in your R toolset as long as you do not take care of all of these N/As. Duplicates, as well as outliers and invalid data introduce bias into your analysis. In fact, if you are not aware of these, you might come to totally wrong conclusions. Now in this course, I will show you multiple methods to identify, remove, and replace each of these three issues. It is my goal to give you quick and simple, as well as very advanced methods to choose from. To fully benefit from this course, I recommend you already know how to handle data frames in R. You should be familiar with things like queries, data types, package management, data import and the RStudio interface. Now I really hope you use this course to improve your data preparation skills. The concepts taught here are very versatile and can be applied on nearly any type of data.