Introduction

0

`ggplot2`

is a package in the `tidyverse`

collection whose sole motive is to create graphics. It is a well-known library in R based on the concept of layered grammar of graphics. The grammar of graphics enables you to concisely describe the components of a chart, and the layered approach applies those components layer-wise, making it easy to read and understand the code.
Apart from building graphs, it is widely used in exploratory data analysis since the best way to understand a dataset is to visualize it, which makes it easier to extract relations.

To install the `ggplot2`

package, run one of the following code snippets.

`1 2`

`#To install the entire tidyverse collection which includes ggplot2 install.packages("tidyverse")`

html

`1 2`

`#To install ggplot2 alone install.packages("ggplot2")`

html

There are a few basic components of `ggplot2`

:

`ggplot()`

: This creates a new`ggplot2`

object.`aes()`

: This creates the aesthetic mapping means describing how the variables in the data are mapped to visual properties.`+`

: This allows you to add layers while creating any plot.

If we talk about the layers concept in `ggplot2`

, there are four primary layers:

- Data: Data or subset of a dataset that has been used to create plots.
- Aesthetics: The mappings of the variables in the plot.
- Geometrics: The geom function used to represent data points.
- Theme: Different visual styles for the plot.

Let's look at the basic graphing template and use it to create a few graphs.

`1 2`

`ggplot(data = <DATA>) + <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))`

html

`ggplot(data = <DATA>)`

is the first layer. Use the mpg dataset, which is included in the `ggplot2`

library and will be available when you load it. Below are the columns of the mpg dataset.

`1 2 3 4 5`

`data(mpg) colnames(mpg) "manufacturer" "model" "displ" "year" "cyl" "trans" "drv" "cty" "hwy" "fl" "class"`

html

Let's create a histogram on the column `hwy`

.

`1 2 3 4`

`#You can ignore theme_classic() function if you want, the resulting plot would be in the default theme. ggplot(mpg, aes(hwy))+ geom_histogram(binwidth = 5)+ theme_classic()`

html

The code above uses all four layers. Data and aesthetic are included in `ggplot(mpg, aes(hwy))`

, then the `geom_histogram()`

function added the geometrics of the histogram,
and finally, `theme_classic()`

is optional.

There are different types of geom functions available in `ggplot2`

that can be used in different situations to create different plots. A geom is a geometric object that uses a plot to represent data, for example, a bar chart will use the bar geom, a line chart will use the line geom, and so on. This is reflected in the names of the geom functions as they are named accordingly, such as `geom_line()`

, `geom_bar()`

, etc.

In addition to primary layers, there are a few other useful features in `ggplot2`

such as coordinate system, faceting, and statistical transformation, which we will explore in the remainder of this guide.

In faceting, you split a plot into multiple subplots using any categorical column or variable of the dataset. If you want to divide your plot using one variable, use the `facet_wrap()`

function, and if you
want to divide it using two variables, then use the `facet_grid()`

function.

Let's apply both functions on the histogram you created earlier.

`1 2 3 4 5 6 7`

`# Saving histogram plot in a variable a <- ggplot(mpg, aes(hwy))+ geom_histogram(binwidth = 5)+ theme_classic() # Creating subplots using "cyl" column a + facet_wrap(~cyl)`

html

As you can see, now we have four subplots, one for each unique value against the `cyl`

column.

Now for the `facet_grid() function`

. In this function, you will use two categorical columns from the dataset.

`1 2 3`

`# You will use 'a' variable storing the histogram again thus reducing code redundancy a + facet_grid(drv ~ cyl)`

html

The default coordinate system in `ggplot2`

is a Cartesian coordinate system where the x and y positions are independent to locate a data point.
There are different coordinate system functions in `ggplot2`

that is used on different occasions. The most famous ones are `coord_flip()`

and `coor_polar()`

.

Let's look at a few examples to understand the use of `coord_flip()`

and `coord_polar()`

.

`1 2 3 4 5 6 7 8 9`

`# First create a bar chart, try to find yourself the reason for using fill, show.legend, and width arguments bar <- ggplot(data = mpg) + geom_bar( aes(x = manufacturer, fill = manufacturer), show.legend = FALSE, width = 1 ) bar`

html

`1 2 3`

`#lets use coord_flip() bar + coord_flip()`

html

As for `coord_polar()`

,let's apply it to the same bar chart to use polar coordinates.

`1`

`bar + coord_polar()`

html

If you look at the bar chart we created previously, you can see it shows additional information, including the count of records against each manufacturer, but the count is not available in the dataset. Some graphs show raw values, while others calculate new values and add them to the plot. The algorithm used to calculate new values for a graph is called a statistical transformation, or stat.

A bar graph can be created using the `stat_count()`

function instead of `geom_bar()`

. Every geom function has a default stat that can be overridden. See the example below.

`1 2 3 4 5 6 7 8 9 10`

`# Creating a dataset library(dplyr) a <- mpg%>% group_by(manufacturer)%>% summarise(Count = n()) # Resetting the stat of geom_bar() from count to identity ggplot(data = a)+ geom_bar(mapping = aes(x = manufacturer, y = Count), stat = "identity")`

html

In the previous sections, you learned a foundation of creating a graph using `ggplot2`

along with facets, coordinate systems, and statistical transformation.
Now let's apply all of these using the template below.

`1 2 3 4 5 6 7 8 9`

`ggplot(data = <DATA>) + <GEOM_FUNCTION>( mapping = aes(<MAPPINGS>), stat = <STAT> ) + <COORDINATE_FUNCTION> + <FACET_FUNCTION>`

html

Keep `stat = "count"`

, which is a default, and use the mpg dataset only.

`1 2 3 4 5 6`

`ggplot(data = mpg)+ geom_bar(mapping = aes(x = manufacturer, fill = manufacturer), stats = "count", show.legend = FALSE, width = 1 )+ coord_flip()+ facet_wrap(~year)`

html

This guide explains the basics of creating a graph using `ggplot2`

in R. You can also use `ggplot2`

for your own data visualization requirements or in any data analysis project.
There is a lot of flexibility when you are creating graphs with `ggplot2`

. Each function contains a set of arguments that is available to tweak the graph accordingly.
As a part of open source development, new features are added continually. This guide gives you a push to explore more in `ggplot2`

.

For more information, visit this repo.

0