Splitting and Combining Data with R

If you’ve struggled to categorize dates, clean strings, or order bars in ggplot, this course is for you. Learn the basics of splitting and combining data, variable cleaning and creation, grouping and summarizing data, and creating visualizations.
Course info
Level
Beginner
Updated
Oct 25, 2019
Duration
1h 57m
Table of contents
Description
Course info
Level
Beginner
Updated
Oct 25, 2019
Duration
1h 57m
Description

Summarizing statistics across groups is invaluable for comparing categories of observations. In this course, Splitting and Combining Data with R, you'll explore splitting data into groups based on some criteria, applying functions or calculations to each group independently, and combining the results into a data structure. To begin, you’ll learn how to create custom categorical variables for grouping, and custom numeric variables to which you can apply functions. Next, with the criteria for grouping created, you will split the data, apply functions, and combine the data into a data structure. Finally, with the raw data transformed, you’ll discover how a grouped dataframe can then be ungrouped with summary statistics maintained, or keep the grouped dataframe intact with plotting functions for visualizing variation between groups. By the end of this course, you’ll have a better understanding of how to use R to build data pipelines with dplyr, manipulate strings and dates for feature engineering, and create customized ggplot charts. .

About the author
About the author

Mariah Weatherford is a data scientist and software biomathematician for a molecular diagnostic company. Trained as a statistician with a background in economics, data is the driver of all her work. Her primary tool for wrangling, modeling and visualizing data is R.

Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi everyone. My name is Mariah Weatherford, and welcome to my course, Splitting and Combining Data with R. I'm a data scientist with a background in statistics and economics. Many data projects require information to be grouped and summarized. By splitting data into categories, creating summary metrics, and combining this aggregated data into a single data table, it's possible to create grouped datasets that can be used in a variety of analyses. It's rare to work with a source of data that doesn't require any cleaning or feature engineering before it's ready for analysis, so this course begins with variable creation. We then go into the building of grouped datasets with dplyr's group_by and summarize functions. Finally, grouped data will be used to create visualizations with ggplot. By the end of this course, you'll know how to build data pipelines with dplyr, manipulate strings and dates for feature engineering, and create customized ggplot charts. Before beginning this course, you should be familiar with some introductory R, including installing packages and importing datasets. I hope you'll join me on this journey to learn how to generate custom datasets for groups with my Splitting and Combining Data with R course, at Pluralsight.