Author avatar

Nishant Kumar Singh

Data Visualization with ggplot2

Nishant Kumar Singh

  • Oct 27, 2020
  • 5 Min read
  • 52 Views
  • Oct 27, 2020
  • 5 Min read
  • 52 Views
Data
Data Analytics
ggplot2
Data Visualization

Introduction

R is an amazing open source platform for data visualization. It is capable of creating any type of chart. But before creating any type of chart you should have an idea that what you want to show and select the chart from there. Suppose you have a dataset that shows stock value fluctuations of a company over a period of time, then we would like to use a graph that shows a comparison of values over time. Based on the dataset and thought of representing data, charts can be divided into four types:

  1. Comparison Charts
  2. Distribution Charts
  3. Composition Charts
  4. Relationship Charts

Comparison Charts

These types of charts would compare the data among the categories or over time. For comparing over categories you could use a bar chart or column chart. For data points that need to be compared over time you can use a line chart or circular area chart.

Distribution Charts

These types of charts would show the distribution of data over one or more variables. For the distribution of one variable, you can use a histogram or density plot. The distribution of two variables can be plotted with a scatter plot.

Composition Charts

These types of charts show changes in composition over time or static composition. For showing the change in composition over time you can use a stacked column chart or stacked area chart. The static composition can be shown through a pie chart or waterfall chart.

Relationship Charts

These types of charts are used to show the relationship between two or more variables. For showing the relationship between two variables you can use a scatter plot. If you want to show the relationship between three variables you can use a bubble chart.

Plotting Charts with ggplot2

In this section you will plot different types of charts using ggplot2 in R. Below are the prerequisites for using ggplot2. You can get more information about this package here.

1
2
3
4
5
# Install ggplot2 in Rstudio
install.packages("ggplot2")

# Loading ggplot2
library(ggplot2) 
html

Creating Comparison Charts

1
2
3
4
5
6
# Creating a dataframe 
df <- data.frame(trt = c("a", "b", "c"), outcome = c(2.3, 1.9, 3.2))

# Plotting a bar plot
ggplot(df, aes(trt, outcome)) +
geom_bar(stat = "identity")
html

BarPlot

1
2
3
4
5
6
7
8
9
10
11
12
# Creating dataframe for line plot
x <- seq(0.01, .99, length.out = 100)
df <- data.frame(
  x = rep(x, 2),
  y = c(qlogis(x), 2 * qlogis(x)),
  group = rep(c("a","b"),
  each = 100)
)

# Creating aline plot
ggplot(df, aes(x=x, y=y, group=group))+
geom_line()
html

LinePlot

Creating Distribution Charts

1
2
3
# Using diamonds dataset fro creating a density plot
ggplot(diamonds, aes(carat)) +
  geom_density()
html

DensityPlot

In the density plot, there are no bins defined. This kind of plot is used when there is a huge number of data points. When the data points are low you can use a histogram.

1
2
3
# geom_point() is used to plot a scatter plot
ggplot(mtcars, aes(wt, mpg))+ 
geom_point()
html

PointPlot

The scatter plot is used to show the distribution of two variables at a time.

Creating a Composition Charts

There is no geom function available to create a pie chart. However, the bar chart can be converted into a pie chart using the coord_polar() function.

1
2
3
4
5
6
# Creating a dataframe 
df <- data.frame(trt = c("a", "b", "c"), outcome = c(2.3, 1.9, 3.2))
# Plotting a pie plot
ggplot(df, aes(x ="", y = outcome, fill = trt)) +
geom_bar(width = 1,stat = "identity")+
coord_polar("y", start = 0) 
html

Pie

Creating a Relationship Chart

The bubble chart is an extension of a scatter plot. In a scatter plot, you can show the relationship between two variables. In the bubble chart, you can show the relationship between three variables at a time.

1
2
3
# Creating a Bubble chart
ggplot(mtcars, aes(x=wt, y=mpg)) + 
  geom_point(aes(size=qsec)
html

Bubble

Conclusion

There are a huge number of graphs available, each with a specific quality and used to show certain factors that are hidden in data. You don't need to learn each and every graph, but a few are very important if you want to do data analysis. Data visualization is a huge topic and there are a number of packages developed for visualization purposes. If you want to learn in more detail please visit here.

1