This guide will help you to determine the best practices to select visualizations for your needs. The approach is simple, we start by answering questions relevant to the dataset you are working with and generating visualization using the R code for each scenario. The charts used in this guide are simple and usually without color or dimension adjustments. This will help to understand how a single line code can help generate charts quickly. We will be using R for all the visualizations so RStudio or a compatible program should be installed.
Quantitative data is information about quantities and, generally speaking, is something that can be measured. On the other hand, Qualitative data is about information that cannot be measured and is known as categorical data. Data collected using focus groups discussions, one-on-one interviews, or case studies is usually Qualitative. It can further be classified as below:
Binomial is not Binary Data (0 or 1, True or False) and, generally, is the result of probability outcomes. For example, a series of trials resulting in either of two possible outcomes. Scenario: A Coin is flipped 100 times and the results are recorded. A Scatter Plot is used to show the distribution.
1 2 3
#Generating Sample numbers > myseq <- seq(1,100,by = 1) > print(myseq)
1 2 3
# Binomial distribution > BD <- dbinom(myseq,100,0.5) > print(BD)
# Make a Scatter Plot. > plot (x = myseq, y = BD, xlab ="Probability",ylab="Observation number")
Values are allocated to distinct categories and have no meaningful order. For example, data about smoking habits can be categorized but not ordered. Scenario: Random data is generated to categorize gender. Pie charts are used to show the percentage values.
1 2 3
#Generating dataset with Gender Values > Gender <- c("Male", "Female", "No Answer") > Gendervals <- table(sample(Gender,25,replace = TRUE, prob = c(0.25,0.50,0.25)))
# Making a basic Pie chart > pie( Gendervals, Gender)
Nominal data that is ordered but there is no scale to measure the difference. The order or rank of the data is the distinctive feature of ordinal data. Scenario: Let us create dummy survey dataset for patients who were asked how often they get a headache and if the pain is high medium or low. We will make a Spine plot to show the results.
1 2 3 4 5 6 7
#build a dataset for survey data for Frequency of headaches vs level of pain. > frequecyh <- rep(1:5, times = c(20, 38, 16, 72, 40)) > levelpain <- c(rep(1:5, times = c(25, 17, 18, 02, 01)), #High rep(1:5, times = c(20, 04, 21, 01, 02)), # Medium rep(1:5, times = c(22, 03, 30, 03, 17))) #Low > levelp <- data.frame(frequecyh, levelpain) > plot(factor(levelpain)~factor(frequecyh), xlab= "Frequency of pain", ylab= "Level of Pain")
Data that represents a range of information that can be divided into levels. To summarize, data that cannot be counted and has infinite values but is still measurable and can be subdivided. For example, Temperature, Humidity, Height, Weight, etc. are all measured in a given range.
Scenario: Let us take the example of Beaver1 dataset and build a Histogram for the temperature values.
> hist(beaver1$temp,xlab = "Temperature",ylab = "Range",main = "Beaver 1")
Readings or the count of whole numbers that are limited. In simple terms, discreet data involves integers, a limited number of possible values, and cannot be divided into parts. For example, 50 employees, 3 laptops, 5 friends are all whole integers. Scenario: Let us take the example of mtcars dataset where the number of cylinders and gears is a whole number, hence a Bar plot can depict the relation between both.
> x <- table(mtcars$gear,mtcars$cyl) > barplot(x, xlab = "No of cylinders", ylab = "Gears")
Univariate: As the name suggests, Uni (one) and Variate (variable), data has a single variable or, in statistical terms, data composed of a single vector component. For example, temperature changes at a single point on different dates. Scenario: Let us see how the built-in dataset US Accidental deaths get represented in a line graph. It is a simple dataset with year, month, and the number of accidental deaths and is a good example of univariate data.
1 2 3
# Univaraite Data > print (USAccDeaths) > plot(USAccDeaths type = 1)
Multivariate: On the other hand, Multi (many) Variate (variables) are multiple variables in varied states. For example, the study of temperatures, humidity, and wind speed at different points and dates. Scenario: For this scenario lets us take iris dataset and we will use a scatter plot matrix.
# iris data plot (iris)
It is also important to keep focused on what you want to accomplish by the visualization. For example, you can make an area chart to study the relationship between variables and their behaviors and you can use a correlation plot to understand the relationship of variables based on the effect they have on each other. Similarly, you can compare values of the variables depending on criterion; for this, scatter plots and bubble charts can be considered. If the purpose is to study the composition of data, you can use pie charts or percentage charts.
To summarize, the first step to visualization is to understand the data. The table below gives a quick summary of selecting a chart based on your data. If you are looking for simple, quick solutions use plot(), as it is intuitive and can handle common chart types.
|Line Charts||Heat Maps|
|Dotted Line||Stacked Columns charts|
|Candle Stick||Box plot|
|Bar Charts||Bubble Charts|
|Bubble Charts||Waterfall Charts|
|Heat Maps||Line charts|
|Pie Chart||Density Plot|
|Spatial Maps||Time Series Graphs|
|Scatter Plot / Bubble||Scatter Plot||Area Charts||Histograms|
|Dot Plot||Bar Plot / Candlestick||Density Plots||Area Charts|
|Sankey Plot||Axis Line /Dual Line||Waterfall Charts||Scatter Plot|
|Correlation Plot||Cohort Diagrams||Column Chart||Heat Maps|