ggplot2 is a package in the
tidyverse collection whose sole motive is to create graphics. It is a well-known library in R based on the concept of layered grammar of graphics. The grammar of graphics enables you to concisely describe the components of a chart, and the layered approach applies those components layer-wise, making it easy to read and understand the code.
Apart from building graphs, it is widely used in exploratory data analysis since the best way to understand a dataset is to visualize it, which makes it easier to extract relations.
To install the
ggplot2 package, run one of the following code snippets.
#To install the entire tidyverse collection which includes ggplot2 install.packages("tidyverse")
#To install ggplot2 alone install.packages("ggplot2")
There are a few basic components of
ggplot(): This creates a new
aes(): This creates the aesthetic mapping means describing how the variables in the data are mapped to visual properties.
+: This allows you to add layers while creating any plot.
If we talk about the layers concept in
ggplot2, there are four primary layers:
Let's look at the basic graphing template and use it to create a few graphs.
ggplot(data = <DATA>) + <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
ggplot(data = <DATA>) is the first layer. Use the mpg dataset, which is included in the
ggplot2 library and will be available when you load it. Below are the columns of the mpg dataset.
1 2 3 4 5
data(mpg) colnames(mpg) "manufacturer" "model" "displ" "year" "cyl" "trans" "drv" "cty" "hwy" "fl" "class"
Let's create a histogram on the column
1 2 3 4
#You can ignore theme_classic() function if you want, the resulting plot would be in the default theme. ggplot(mpg, aes(hwy))+ geom_histogram(binwidth = 5)+ theme_classic()
The code above uses all four layers. Data and aesthetic are included in
ggplot(mpg, aes(hwy)), then the
geom_histogram() function added the geometrics of the histogram,
theme_classic() is optional.
There are different types of geom functions available in
ggplot2 that can be used in different situations to create different plots. A geom is a geometric object that uses a plot to represent data, for example, a bar chart will use the bar geom, a line chart will use the line geom, and so on. This is reflected in the names of the geom functions as they are named accordingly, such as
In addition to primary layers, there are a few other useful features in
ggplot2 such as coordinate system, faceting, and statistical transformation, which we will explore in the remainder of this guide.
In faceting, you split a plot into multiple subplots using any categorical column or variable of the dataset. If you want to divide your plot using one variable, use the
facet_wrap() function, and if you
want to divide it using two variables, then use the
Let's apply both functions on the histogram you created earlier.
1 2 3 4 5 6 7
# Saving histogram plot in a variable a <- ggplot(mpg, aes(hwy))+ geom_histogram(binwidth = 5)+ theme_classic() # Creating subplots using "cyl" column a + facet_wrap(~cyl)
As you can see, now we have four subplots, one for each unique value against the
Now for the
facet_grid() function. In this function, you will use two categorical columns from the dataset.
1 2 3
# You will use 'a' variable storing the histogram again thus reducing code redundancy a + facet_grid(drv ~ cyl)
The default coordinate system in
ggplot2 is a Cartesian coordinate system where the x and y positions are independent to locate a data point.
There are different coordinate system functions in
ggplot2 that is used on different occasions. The most famous ones are
Let's look at a few examples to understand the use of
1 2 3 4 5 6 7 8 9
# First create a bar chart, try to find yourself the reason for using fill, show.legend, and width arguments bar <- ggplot(data = mpg) + geom_bar( aes(x = manufacturer, fill = manufacturer), show.legend = FALSE, width = 1 ) bar
1 2 3
#lets use coord_flip() bar + coord_flip()
coord_polar(),let's apply it to the same bar chart to use polar coordinates.
bar + coord_polar()
If you look at the bar chart we created previously, you can see it shows additional information, including the count of records against each manufacturer, but the count is not available in the dataset. Some graphs show raw values, while others calculate new values and add them to the plot. The algorithm used to calculate new values for a graph is called a statistical transformation, or stat.
A bar graph can be created using the
stat_count() function instead of
geom_bar(). Every geom function has a default stat that can be overridden. See the example below.
1 2 3 4 5 6 7 8 9 10
# Creating a dataset library(dplyr) a <- mpg%>% group_by(manufacturer)%>% summarise(Count = n()) # Resetting the stat of geom_bar() from count to identity ggplot(data = a)+ geom_bar(mapping = aes(x = manufacturer, y = Count), stat = "identity")
In the previous sections, you learned a foundation of creating a graph using
ggplot2 along with facets, coordinate systems, and statistical transformation.
Now let's apply all of these using the template below.
1 2 3 4 5 6 7 8 9
ggplot(data = <DATA>) + <GEOM_FUNCTION>( mapping = aes(<MAPPINGS>), stat = <STAT> ) + <COORDINATE_FUNCTION> + <FACET_FUNCTION>
stat = "count", which is a default, and use the mpg dataset only.
1 2 3 4 5 6
ggplot(data = mpg)+ geom_bar(mapping = aes(x = manufacturer, fill = manufacturer), stats = "count", show.legend = FALSE, width = 1 )+ coord_flip()+ facet_wrap(~year)
This guide explains the basics of creating a graph using
ggplot2 in R. You can also use
ggplot2 for your own data visualization requirements or in any data analysis project.
There is a lot of flexibility when you are creating graphs with
ggplot2. Each function contains a set of arguments that is available to tweak the graph accordingly.
As a part of open source development, new features are added continually. This guide gives you a push to explore more in
For more information, visit this repo.