R is an amazing open source platform for data visualization. It is capable of creating any type of chart. But before creating any type of chart you should have an idea that what you want to show and select the chart from there. Suppose you have a dataset that shows stock value fluctuations of a company over a period of time, then we would like to use a graph that shows a comparison of values over time. Based on the dataset and thought of representing data, charts can be divided into four types:
These types of charts would compare the data among the categories or over time. For comparing over categories you could use a bar chart or column chart. For data points that need to be compared over time you can use a line chart or circular area chart.
These types of charts would show the distribution of data over one or more variables. For the distribution of one variable, you can use a histogram or density plot. The distribution of two variables can be plotted with a scatter plot.
These types of charts show changes in composition over time or static composition. For showing the change in composition over time you can use a stacked column chart or stacked area chart. The static composition can be shown through a pie chart or waterfall chart.
These types of charts are used to show the relationship between two or more variables. For showing the relationship between two variables you can use a scatter plot. If you want to show the relationship between three variables you can use a bubble chart.
In this section you will plot different types of charts using ggplot2 in R. Below are the prerequisites for using ggplot2. You can get more information about this package here.
1 2 3 4 5
# Install ggplot2 in Rstudio install.packages("ggplot2") # Loading ggplot2 library(ggplot2)
1 2 3 4 5 6
# Creating a dataframe df <- data.frame(trt = c("a", "b", "c"), outcome = c(2.3, 1.9, 3.2)) # Plotting a bar plot ggplot(df, aes(trt, outcome)) + geom_bar(stat = "identity")
1 2 3 4 5 6 7 8 9 10 11 12
# Creating dataframe for line plot x <- seq(0.01, .99, length.out = 100) df <- data.frame( x = rep(x, 2), y = c(qlogis(x), 2 * qlogis(x)), group = rep(c("a","b"), each = 100) ) # Creating aline plot ggplot(df, aes(x=x, y=y, group=group))+ geom_line()
1 2 3
# Using diamonds dataset fro creating a density plot ggplot(diamonds, aes(carat)) + geom_density()
In the density plot, there are no bins defined. This kind of plot is used when there is a huge number of data points. When the data points are low you can use a histogram.
1 2 3
# geom_point() is used to plot a scatter plot ggplot(mtcars, aes(wt, mpg))+ geom_point()
The scatter plot is used to show the distribution of two variables at a time.
There is no geom function available to create a pie chart. However, the bar chart can be converted into a pie chart using the
1 2 3 4 5 6
# Creating a dataframe df <- data.frame(trt = c("a", "b", "c"), outcome = c(2.3, 1.9, 3.2)) # Plotting a pie plot ggplot(df, aes(x ="", y = outcome, fill = trt)) + geom_bar(width = 1,stat = "identity")+ coord_polar("y", start = 0)
The bubble chart is an extension of a scatter plot. In a scatter plot, you can show the relationship between two variables. In the bubble chart, you can show the relationship between three variables at a time.
1 2 3
# Creating a Bubble chart ggplot(mtcars, aes(x=wt, y=mpg)) + geom_point(aes(size=qsec)
There are a huge number of graphs available, each with a specific quality and used to show certain factors that are hidden in data. You don't need to learn each and every graph, but a few are very important if you want to do data analysis. Data visualization is a huge topic and there are a number of packages developed for visualization purposes. If you want to learn in more detail please visit here.