Merging Data Sources with R

Learn how to merge data with R. How do you merge values into vectors? How do you merge vectors into data frames? How do you join data frames? See how to use base R and dplyr to do left, right, and full outer joins with plenty of examples.
Course info
Level
Beginner
Updated
Jul 31, 2019
Duration
1h 29m
Table of contents
Description
Course info
Level
Beginner
Updated
Jul 31, 2019
Duration
1h 29m
Description

In your R data science projects, you need very often to work with data which is spread out across multiple data sources. For example, given two separate data sets on products and their sales, how can you merge them into a new data set? In this course, Merging Data Sources with R, you will gain the ability to merge data from different sources in a controlled way that enables you to keep only the data you need. First, you will learn to merge vectors, which includes using the paste() and append() methods. Next, you will discover how to join data sets with the merge() function, which includes left, inner, right and full outer joins, on data frames that can have one-to-one, one-to-many, or many-to-many relationships. Finally, you will explore how to join data sets with the dplyr package, which covers the previous joins plus anti and semi joins. When you are finished with this course, you will have the skills and knowledge of merging data from different sources, needed to do data wrangling with R.

About the author
About the author

As a software engineer and lifelong learner, Dan wrote a PhD thesis and many highly-cited publications on decision making and knowledge acquisition in software architecture. Dan used Microsoft technologies for many years, but moved gradually to Python, Linux and AWS to gain different perspectives of the computing world.

More from the author
Boost Data Science Productivity with PyCharm
Intermediate
2h 33m
Mar 29, 2019
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hello. My name is Dan Tofan, and welcome to my course, Merging Data Sources with R. I'm a Senior Software Engineer with a PhD in Software Architecture, and I like sharing knowledge and getting things done. The R language is very popular in data science because R is built and optimized by statisticians to work with data. In most data science projects, data arrives from different sources, and it needs merging. This course offers you a gentle introduction to merging data with R, and it only requires very basic R programming skills. The course includes these topics: merging vectors, understanding relationships between data sets, joining data sets, and how to remember the differences between left, inner, and right joins. By the end of this short and rapid-pace course, you will know the basics of merging data with R, including joining data sets with various types of joins. Your effort to watch this course is a great knowledge investment that will pay off again and again in your future R data science projects. Furthermore, it will also pay off when joining data in other languages, such as SQL or Python, since the core ideas about joining data are easily transferable. So, let's get started.