Description
Course info
Level
Beginner
Updated
Nov 26, 2019
Duration
2h 5m
Description

Do you want to learn how data exploration can be implemented in R? Without data exploration, the whole data analysis process gets inefficient and slow, but follow a good data exploration process and you'll be guided to valuable insights. In this course, Exploring Your First Data Set with R, you will learn how new datasets are explored and analyzed in a quick and efficient way. First, you will learn the methods outlined, following a logical succession, which are applicable in most standard data frames. Then, you will discover how the process is divided into 3 steps: summary statistics, distribution checks, and relation analysis. These steps build on each other and you will find out which variables are worth further analysis and where variable dependencies exist. Finally, you will gain the knowledge of the ground work for machine learning and final data presentation. When you’re finished with this course, you’ll have the skills to properly structure and conduct data exploration in R.

About the author
About the author

Martin is a trained biostatistician, programmer, consultant and data science enthusiast. His main objective: Explaining data science in a straightforward way. You can find his latest work over at: r-tutorials.com

More from the author
Annotating ggplot2 Visualizations in R
Intermediate
2h 1m
Dec 11, 2019
Formatting ggplot2 Visualization Elements in R
Intermediate
2h 1m
Nov 13, 2019
More courses by Martin Burger
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
[Autogenerated] welcome to exploring your first data set in our This is Martin Berger for Flora site Data exploration is a very important part of the data analysis process. It guards the way to further in depth analysis. The insights gained here are also required for accurate machine learning algorithms. Data exploration is usually performed right after or even during later cleaning. Therefore, it is one of the first things you do once you receive new data sets in this course, I'm outlining a blueprint. A structure that you can use for most scenarios essentially were first summarizing the data and we'll check for completeness and accuracy. After that, we check the distributions off the variables. Most importantly, I will show you several ways in how you can see if your data has a normal distribution or not. And in the last step, we perform some tests to see if there are relationships between variables. These relationships are the ultimate goal off exploratory analysis because machine learning algorithms and other prediction tools rely on these relationships. And, of course, data visualizations make the most sense on variables that have a relationship to each other. To fully benefit from this course, you just need very basic are in statistics scales. After all, this is a beginner's course. Now I really hope that you will enjoy this course and that you can use the flu print in your daily work.