Data Visualizations: Approaches to Select the Right Chart for Your Data

This guide will help you to determine the best visualizations for your needs, then building the visualizations using R.

By Jaya Trivedi

May 7, 2019 • 7 Minute Read

Introduction

This guide will help you to determine the best practices to select visualizations for your needs. The approach is simple, we start by answering questions relevant to the dataset you are working with and generating visualization using the R code for each scenario. The charts used in this guide are simple and usually without color or dimension adjustments. This will help to understand how a single line code can help generate charts quickly. We will be using R for all the visualizations so RStudio or a compatible program should be installed.

Know Your Data

Is it Qualitative or Quantitative?

Quantitative data is information about quantities and, generally speaking, is something that can be measured. On the other hand, Qualitative data is about information that cannot be measured and is known as categorical data. Data collected using focus groups discussions, one-on-one interviews, or case studies is usually Qualitative. It can further be classified as below:

Binomial Data

Binomial is not Binary Data (0 or 1, True or False) and, generally, is the result of probability outcomes. For example, a series of trials resulting in either of two possible outcomes. Scenario: A Coin is flipped 100 times and the results are recorded. A Scatter Plot is used to show the distribution.

          #Generating Sample numbers
> myseq <- seq(1,100,by = 1)
> print(myseq)
    

          # Binomial distribution
> BD <- dbinom(myseq,100,0.5)
> print(BD)
    

          # Make a Scatter Plot.
> plot (x = myseq, y = BD, xlab ="Probability",ylab="Observation number")
    

Nominal Data:

Values are allocated to distinct categories and have no meaningful order. For example, data about smoking habits can be categorized but not ordered. Scenario: Random data is generated to categorize gender. Pie charts are used to show the percentage values.

          #Generating dataset with Gender Values 
> Gender <- c("Male", "Female", "No Answer")
> Gendervals <- table(sample(Gender,25,replace = TRUE, prob = c(0.25,0.50,0.25)))
    

          # Making a basic Pie chart
> pie( Gendervals, Gender)
    

Ordinal Data

Nominal data that is ordered but there is no scale to measure the difference. The order or rank of the data is the distinctive feature of ordinal data. Scenario: Let us create dummy survey dataset for patients who were asked how often they get a headache and if the pain is high medium or low. We will make a Spine plot to show the results.

          #build a dataset for survey data for Frequency of headaches vs level of pain.
> frequecyh <- rep(1:5, times = c(20, 38, 16, 72, 40))
> levelpain <- c(rep(1:5, times = c(25, 17, 18, 02, 01)), #High
           rep(1:5, times = c(20, 04, 21, 01, 02)), # Medium
           rep(1:5, times = c(22, 03, 30, 03, 17))) #Low
> levelp <- data.frame(frequecyh, levelpain)
> plot(factor(levelpain)~factor(frequecyh), xlab= "Frequency of pain", ylab= "Level of Pain")
    

Continuous Data

Data that represents a range of information that can be divided into levels. To summarize, data that cannot be counted and has infinite values but is still measurable and can be subdivided. For example, Temperature, Humidity, Height, Weight, etc. are all measured in a given range.
Scenario: Let us take the example of Beaver1 dataset and build a Histogram for the temperature values.

      > hist(beaver1$temp,xlab = "Temperature",ylab = "Range",main = "Beaver 1")

Discreet Data

Readings or the count of whole numbers that are limited. In simple terms, discreet data involves integers, a limited number of possible values, and cannot be divided into parts. For example, 50 employees, 3 laptops, 5 friends are all whole integers. Scenario: Let us take the example of mtcars dataset where the number of cylinders and gears is a whole number, hence a Bar plot can depict the relation between both.

          > x <- table(mtcars$gear,mtcars$cyl)
> barplot(x, xlab = "No of cylinders", ylab = "Gears")
    

Is Your Data Univariate or Multivariate?

Univariate

Univariate: As the name suggests, Uni (one) and Variate (variable), data has a single variable or, in statistical terms, data composed of a single vector component. For example, temperature changes at a single point on different dates. Scenario: Let us see how the built-in dataset US Accidental deaths get represented in a line graph. It is a simple dataset with year, month, and the number of accidental deaths and is a good example of univariate data.

          # Univaraite Data 
> print (USAccDeaths)
> plot(USAccDeaths type = 1)
    

Multivariate

Multivariate: On the other hand, Multi (many) Variate (variables) are multiple variables in varied states. For example, the study of temperatures, humidity, and wind speed at different points and dates. Scenario: For this scenario lets us take iris dataset and we will use a scatter plot matrix.

          # iris data 	
plot (iris)
    

What Do You Want to Accomplish?

It is also important to keep focused on what you want to accomplish by the visualization. For example, you can make an area chart to study the relationship between variables and their behaviors and you can use a correlation plot to understand the relationship of variables based on the effect they have on each other. Similarly, you can compare values of the variables depending on criterion; for this, scatter plots and bubble charts can be considered. If the purpose is to study the composition of data, you can use pie charts or percentage charts.

In Conclusion

To summarize, the first step to visualization is to understand the data. The table below gives a quick summary of selecting a chart based on your data. If you are looking for simple, quick solutions use plot(), as it is intuitive and can handle common chart types.

Univariate	Multivariate
Line Charts	Heat Maps
Dotted Line	Stacked Columns charts
Candle Stick	Box plot
Bar Charts	Bubble Charts

Qualitative	Quantitative
Bubble Charts	Waterfall Charts
Heat Maps	Line charts
Sankey Plots	Histograms
Pie Chart	Density Plot
Spatial Maps	Time Series Graphs

Correlate	Compare	Composition	Distribution
Scatter Plot / Bubble	Scatter Plot	Area Charts	Histograms
Dot Plot	Bar Plot / Candlestick	Density Plots	Area Charts
Sankey Plot	Axis Line /Dual Line	Waterfall Charts	Scatter Plot
Correlation Plot	Cohort Diagrams	Column Chart	Heat Maps

Jaya T.

Written content author.

More about this author