Data management and data preparation is a very important yet widely overlooked part of data analysis. Importing, selecting a proper class, cleaning, and filtering are all part of data preparation and will be taught in this course.
Have you ever encountered problems in data analysis just because the data was not clean, had a wrong format, or was simply messy? Data preparation is an immensely important yet overlooked field in data science. Most of the time of a data professional is not spent analyzing or visualizing, it is spent getting data ready as clean and well-structured as possible. R is a widely used open source tool with an active user community. This community created high quality add on packages for data preparation. In this course, Data Management and Preparation Using R, you will not only learn about data preparation in R Base, you will also learn about those add on packages that make R so powerful. First, you'll learn about data importing, cleaning, and structuring (selecting the right class). Next, you'll explore data querying. Finally, you will learn about dplyr, tidyr, reshape2 and data.table. At the end of this course, you will be able to select the right tools and efficiently perform data import, formatting, cleaning, and querying.
Martin is a trained biostatistician, programmer, consultant and data science enthusiast. His main objective: Explaining data science in a straightforward way. You can find his latest work over at: r-tutorials.com
Course Overview Hi guys, this is Martin Burger, and I welcome you to my course, Data Management and Preparation Using R. As a data scientist and biostatistician, I know first-hand how important clean and well formatted data is. R is widely used for data analytics. It offers great data management and data preparation tools. In this course, you will learn techniques to solve the most common problems of the data preparation steps. This is basically the first step in the whole data analytics process, which means, you cannot escape this. No matter your industry or analytics approach, you have to prepare your data first. In the course, you will see how to use different import tools to get standard as well as exotic file formats into R. You will learn which object classes are best suited towards your data sets. You will use the tidyr add on package to clean and format your data, and you will use the data. table package, as well as standard tools in order to filter or query even large data sets. By the end of this course, you will be able to select suitable add on packages and use the best functions for data preparation. I would categorize this course as a beginners plus course. If you're familiar with basic R code, you will be able to fully benefit from this course. Alright guys, I hope you will enjoy this course, I'll see you inside.
Introduction Hi guys, this is Martin Burger for Pluralsight. In this module, I will show you where this course is located on the data science landscape. We briefly talk about the main terminology behind this course. I will tell you the packages you will need to follow along, and how to best prepare yourself to get the most out of the course. And I will tell you how the course is structured, and what you can and cannot expect from this course. Basically, this module will give you all the info needed to fully benefit from this course.
Selecting Suitable Classes and Importing Data Hi guys, this is Martin Burger for Pluralsight. In this module, you will learn how to import your data and how to select a proper class. We'll take a look at standard import methods, which come preinstalled with our studio. We'll take a look at the fread function of data. table, and we'll learn how to use the package foreign for non-CSV file formats. I will also show you alternative classes to data. frames. These are data. table, and data_frame. Learning how to properly import your data is often overlooked. In a scripting language like R, even an otherwise trivial task, like data import, can have some pitfalls. Data. frames have some limitations, therefore it is great to have some alternatives at hand, in case you have larger datasets.
Cleaning Data with tidyr Hi guys, this is Martin Burger for Pluralsight. In this module, we'll discuss the specifics of properly formatted data. You will first learn how a tidy dataset looks, then you will learn how to fix common data formatting problems via four demos. For that, we're going to use three R packages, and some ad-hoc datasets. In this module, you'll also learn about filtering and mutating joins to connect to data frames.
Data Filtering and Querying with dplyr and data.table Hi there, this is Martin Burger for Pluralsight. This module is all about data filtering, or running queries on your datasets. We will start out with some theory on queries and the appropriate packages including the great data. table package. To put a theory into practice, we will do some demos using two different datasets. The first one is a simple ad-hoc dataset, while the other one is the diamonds dataset of ggplot2 with over 50, 000 rows. You will learn step by step how to query on a row and on a column level. I will show you how to group your queries for more detailed outputs. And I will show you how to use keys in data. table to make queries faster.
Course Recap and Your Next Steps Hi guys, it's Martin Burger for Pluralsight. In this last module, I will show you some great resources to identify suitable libraries, how to get help, and where to find some practice opportunities. This is crucial info, since you will encounter various problems in your daily work. Knowing where to get help is the first step to solve these issues. At last, we'll also summarize what we did in this course.