Cleaning Data with R

Paths

Cleaning Data with R

Authors: Martin Burger, Chase DeHan

Cleaning data accounts for 70-80% of an analyst’s time. This skill teaches you how to understand the nature of your data, identify problem areas, and then clean the data set to... Read more

What You Will Learn

  • This skill conveys R-based solutions for the most common data cleaning operations encountered in the analytics workflow. The skill and courses should be structured around data-centric topics, rather than language or package features.

Pre-requisites

  • Data Literacy
  • R for Data Analysts

Beginner

Alter data types to enable later analytics as well as altering and renaming columns in a dataframe for tidy data sets.

Querying and Converting Data Types in R

by Martin Burger

Aug 15, 2019 / 2h 6m

2h 6m

Start Course
Description

Have you ever wanted to learn how R is used to handle the most common data types? The knowledge you will gain here is foundational for any aspiring R programmer. In this course, Querying and Converting Data Types in R, you will develop an understanding of all of these data types and how they are processed, converted, and filtered. First, you will explore general data analysis concepts and take a look at the data frame and its main alternatives. Next, you will learn the major data types in R: numeric, integer, factor, character, boolean, and date and time. Finally, you will discover how these data types are used in a data frame. The query and filtering methods largely depend on the data types available in that data frame. When you are finished with this course, you will have your first set of skills that will be invaluable in your further learning path. In fact, the skills taught here are so important in data science, that most of it can be used in other languages (Python, Matlab) and programs (SPSS).

Table of contents
  1. Course Overview
  2. Understanding Dataset Structures and Formats
  3. Selecting and Converting Data Types
  4. Querying and Filtering Data
  5. Course Summary and Further Resources

Manipulating Dataframes in R

by Chase DeHan

Aug 9, 2019 / 1h 24m

1h 24m

Start Course
Description

Data preparation is one of the most difficult and time-consuming tasks for data professionals. In this course, Manipulating Dataframes in R, you will learn foundational knowledge of the R dataframe. First, you will explore the basics of the data frame. Next, you will discover how to access certain fields in your data. Finally, you will learn how to do these same tasks with the powerful dplyr package. When you’re finished with this course, you will have the skills and knowledge of data manipulation in R needed to succeed at getting your data into the proper form for analysis.

Table of contents
  1. Course Overview
  2. Understanding Dataframe Basics
  3. Slicing Dataframes
  4. Filtering Rows by Criteria
  5. Manipulating Dataframes with dplyr

Intermediate

Clean string data, manage missing data values, duplicate data rows, and manage invalid data.

Manipulating String Data in R

by Martin Burger

Jan 16, 2020 / 2h 1m

2h 1m

Start Course
Description

Do you want to learn how to use R to efficiently clean and analyze string data? In this course, Manipulating String Data in R, you will learn multiple techniques that are crucial for string related work. First, you will learn the multiple functions available that help you in handling string or character data. Next, you will be shown the best functions from both R Base as well as the stringr add on package. Then, all common tasks will be demonstrated including: replacing, removing, counting or searching string patterns via regex syntax. Next, you will understand Regex, are invaluable in defining specific patterns in a string. Finally, you will see how sentiment analysis is performed in R, and the sentiment lexicons available for fast and efficient sentiment analysis. When you’re finished with this course, you’ll have the skills and knowledge of string handling needed to work on all sorts of text based data including books, social media feeds and tweets.

Table of contents
  1. Course Overview
  2. Understanding String Data
  3. Working on Strings with R Base and Regex
  4. Using the Package stringr
  5. Using Advanced String Tools and Course Summary

Coping with Missing, Invalid, and Duplicate Data in R

by Martin Burger

Oct 11, 2019 / 2h 48s

2h 48s

Start Course
Description

Data preparation is part of nearly any data analytics project, therefore the skills are highly valuable. In this course, Coping with Missing, Invalid, and Duplicate Data in R, you will learn the main steps of data preparation. First, you will learn how to handle duplicate data. Next, you will discover that missing values prevent a lot of R functions from working properly, therefore you are limited in your R toolset as long as you do not take care of all these NA's. Finally, you will explore outlier and invalid data detection and how they can introduce bias into your analysis. When you’re finished with this course, you will understand why missing values, outliers, and duplicates are problematic, how to detect them, and how to remove them from the dataset.

Table of contents
  1. Course Overview
  2. Managing Duplicate Data
  3. Managing Missing Data
  4. Outlier and Invalid Data Detection
  5. Further Resources and Summary
Offer Code *
Email * First name * Last name *
Company
Title
Phone
Country *

* Required field

Opt in for the latest promotions and events. You may unsubscribe at any time. Privacy Policy

By providing my phone number to Pluralsight and toggling this feature on, I agree and acknowledge that Pluralsight may use that number to contact me for marketing purposes, including using autodialed or pre-recorded calls and text messages. I understand that consent is not required as a condition of purchase from Pluralsight.

By activating this benefit, you agree to abide by Pluralsight's terms of use and privacy policy.

I agree, activate benefit