Description
Course info
Level
Beginner
Updated
Nov 26, 2019
Duration
2h 5m
Description

Do you want to learn how data exploration can be implemented in R? Without data exploration, the whole data analysis process gets inefficient and slow, but follow a good data exploration process and you'll be guided to valuable insights. In this course, Exploring Your First Data Set with R, you will learn how new datasets are explored and analyzed in a quick and efficient way. First, you will learn the methods outlined, following a logical succession, which are applicable in most standard data frames. Then, you will discover how the process is divided into 3 steps: summary statistics, distribution checks, and relation analysis. These steps build on each other and you will find out which variables are worth further analysis and where variable dependencies exist. Finally, you will gain the knowledge of the ground work for machine learning and final data presentation.

When you’re finished with this course, you’ll have the skills to properly structure and conduct data exploration in R.

About the author
About the author

Martin is a trained biostatistician, programmer, consultant and data science enthusiast. His main objective: Explaining data science in a straightforward way. You can find his latest work over at: r-tutorials.com

More from the author
More courses by Martin Burger
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Welcome to Exploring Your First Data Set in R. This is Martin Burger for Pluralsight. Data exploration is a very important part of the data analysis process. It guides the way to further in-depth analysis. The insights gained here are also required for accurate machine learning algorithms. Data exploration is usually performed right after or even during data cleaning; therefore, it is one of the first things you do once you receive new datasets. In this course, I'm outlining a blueprint, a structure, that you can use for most scenarios. Essentially, we're first summarizing the data, and we check for completeness and accuracy. After that, we check the distributions of the variables. Most importantly, I will show you several ways in how you can see if your data has a normal distribution or not. And in the last step, we perform some tests to see if there are relationships between variables. These relationships are the ultimate goal of exploratory analysis because machine learning algorithms and other prediction tools rely on these relationships. And of course, data visualizations make the most sense on variables that have a relationship to each other. To fully benefit from this course, you just need very basic R and statistics skills. After all, this is a beginners course. Now I really hope that you will enjoy this course and that you can use the blueprint in your daily work.