How to create data visualizations with R
The R programming language is a popular tool for creating data visualizations. Its interactive programming capabilities and powerful data visualization features often make it the first choice for experts in the field. In this article, you’ll learn about data visualization with R, the three main plotting systems in R and where to learn more.
Why choose R for data visualization?
There are several reasons why R has become so popular for data visualization.
First, the R language was designed by experts specifically for data analysis. Data visualization is a key component to data analysis. So, the language makes it a breeze to transform raw data into professional data visualizations using industry best practices.
Second, R is modular and extensible. This means that we can easily extend the data visualization capabilities of R with just a few lines of code. R can automatically download, install, and load third-party data visualization packages into memory at runtime. This means that we have a vast number of options available when creating our data visualizations with R.
Third, R provides several ways to publish and deploy data visualizations into production. For example, we can programmatically export data visualizations as JPG, PNG, PDF or SVG files. We can embed data visualizations in R Markdown documents. We can also use server-side scripts or web services to produce data visualizations for reports and applications. Additionally, we can create web-based interactive data visualizations using frameworks like Shiny.
Best of all, R is free, open-source software (read: creating our data visualizations is free of charge). On top of that, we can view the source code, modify it and redistribute it under the GNU general public license. It’s for these compelling reasons—and many more—that R has recently become such a popular option for data visualization.
R’s three main plotting systems
There are three main plotting systems in R. Each of these takes a different approach to data visualization and comes with its own set of pros and cons.
First, we have the base graphics system. This is the plotting system that comes with the standard downloadable distribution of R. It uses a series of high-level plotting functions, each with a set of parameters to create a standard array of data visualizations. You can also extend your existing plots or compose entirely new plots by combining a series of low-level drawing commands. This makes the base plotting system simple, yet flexible, for a wide variety of data visualization scenarios.
Next, we have lattice. Lattice was designed to simplify the creation of multivariate data visualizations. Its formula based, so we define our data visualizations as a formula that describes the data to be visualized. For example, if we wanted to visualize the age of a group of people on the x-axis and their height on the y-axis, we would describe this data visualization using the formula: “height ~ age”, which we read as “height as a function of age”.
Finally, we have ggplot2. This plotting system is based on the Grammar of Graphics, a thesis on describing data visualizations using a language that is concise, yet flexible. We construct our data visualizations in layers using aesthetics, geometries, scales and facets. This makes it easy to compose data visualizations using a language that is both powerful and flexible.
Because each of these three plotting systems takes a different approach to data visualization, I recommend that you become familiar with each of them. Knowing their strengths and weaknesses will help you choose which system to use for each type of data visualization you create.
If you’re interested in learning how to create data visualizations with R, do yourself a favor and take the following five steps.
1.) Head to the R website and download a free copy of R. This way you’ll have it installed on your machine and you can begin experimenting with basic data visualizations.
2.) Download either RStudio or R Tools for Visual Studio (RTVS). RStudio is a free IDE (Integrated Development Environment) that makes it much easier to work with R than the out of the box R IDE. If you prefer using Visual Studio as your primary IDE, however, RTVS provides a similar developer experience as RStudio.
3.) Find an appropriate source of training materials to learn the basics. For example, Pluralsight has a three-part series on data visualization with R. These three courses cover everything from creating and interpreting basic data visualizations, to more advanced topics like multivariate data visualization, and creating interactive data visualizations.
The three courses in the series are:
4.) Start using R to create your day-to-day data visualizations—practice is absolutely the best way to become proficient. It will take you a bit of time to become as productive using R as your usual data visualization tool. But don’t get discouraged, once you get the basics down you’ll find R to be significantly more powerful than most other data visualization tools.
5.) Keep up with the latest in data visualization practices and technology. There are excellent resources available on the internet that can provide you with this type of information. Nathan Yau’s website Flowing Data is an excellent resource for this information.
Whether you need to create simple data visualizations to answer your own questions, professional data visualizations to communicate to a wider audience or automated reporting systems to support your knowledge workers, R has everything you need to get started. All it takes is the initial simple step of downloading a copy of R today and you’ll be a master at data visualizations in no time.
Learn more about data visualization with R in this course: Beginning Data Visualization with R.