Author avatar

Jaya Trivedi

Data Visualizations: Approaches to Select the Right Chart for Your Data

Jaya Trivedi

  • May 7, 2019
  • 7 Min read
  • 181 Views
  • May 7, 2019
  • 7 Min read
  • 181 Views
Data
General Visualization Principles

Introduction

This guide will help you to determine the best practices to select visualizations for your needs. The approach is simple, we start by answering questions relevant to the dataset you are working with and generating visualization using the R code for each scenario. The charts used in this guide are simple and usually without color or dimension adjustments. This will help to understand how a single line code can help generate charts quickly. We will be using R for all the visualizations so RStudio or a compatible program should be installed.

Know Your Data

Is it Qualitative or Quantitative?

Quantitative data is information about quantities and, generally speaking, is something that can be measured. On the other hand, Qualitative data is about information that cannot be measured and is known as categorical data. Data collected using focus groups discussions, one-on-one interviews, or case studies is usually Qualitative. It can further be classified as below:

Classification

Binomial Data

Binomial is not Binary Data (0 or 1, True or False) and, generally, is the result of probability outcomes. For example, a series of trials resulting in either of two possible outcomes. Scenario: A Coin is flipped 100 times and the results are recorded. A Scatter Plot is used to show the distribution.

1
2
3
#Generating Sample numbers
> myseq <- seq(1,100,by = 1)
> print(myseq)
r
1
2
3
# Binomial distribution
> BD <- dbinom(myseq,100,0.5)
> print(BD)
r
1
2
# Make a Scatter Plot.
> plot (x = myseq, y = BD, xlab ="Probability",ylab="Observation number")
r

Scatter Plot

Nominal Data:

Values are allocated to distinct categories and have no meaningful order. For example, data about smoking habits can be categorized but not ordered. Scenario: Random data is generated to categorize gender. Pie charts are used to show the percentage values.

1
2
3
#Generating dataset with Gender Values 
> Gender <- c("Male", "Female", "No Answer")
> Gendervals <- table(sample(Gender,25,replace = TRUE, prob = c(0.25,0.50,0.25)))
r
1
2
# Making a basic Pie chart
> pie( Gendervals, Gender)
r

Pie

Ordinal Data

Nominal data that is ordered but there is no scale to measure the difference. The order or rank of the data is the distinctive feature of ordinal data. Scenario: Let us create dummy survey dataset for patients who were asked how often they get a headache and if the pain is high medium or low. We will make a Spine plot to show the results.

1
2
3
4
5
6
7
#build a dataset for survey data for Frequency of headaches vs level of pain.
> frequecyh <- rep(1:5, times = c(20, 38, 16, 72, 40))
> levelpain <- c(rep(1:5, times = c(25, 17, 18, 02, 01)), #High
           rep(1:5, times = c(20, 04, 21, 01, 02)), # Medium
           rep(1:5, times = c(22, 03, 30, 03, 17))) #Low
> levelp <- data.frame(frequecyh, levelpain)
> plot(factor(levelpain)~factor(frequecyh), xlab= "Frequency of pain", ylab= "Level of Pain")
r

Spine Plot

Continuous Data

Data that represents a range of information that can be divided into levels. To summarize, data that cannot be counted and has infinite values but is still measurable and can be subdivided. For example, Temperature, Humidity, Height, Weight, etc. are all measured in a given range.
Scenario: Let us take the example of Beaver1 dataset and build a Histogram for the temperature values.

1
> hist(beaver1$temp,xlab = "Temperature",ylab = "Range",main = "Beaver 1")
r

Histogram

Discreet Data

Readings or the count of whole numbers that are limited. In simple terms, discreet data involves integers, a limited number of possible values, and cannot be divided into parts. For example, 50 employees, 3 laptops, 5 friends are all whole integers. Scenario: Let us take the example of mtcars dataset where the number of cylinders and gears is a whole number, hence a Bar plot can depict the relation between both.

1
2
> x <- table(mtcars$gear,mtcars$cyl)
> barplot(x, xlab = "No of cylinders", ylab = "Gears")
r

Barplot

Is Your Data Univariate or Multivariate?

Univariate

Univariate: As the name suggests, Uni (one) and Variate (variable), data has a single variable or, in statistical terms, data composed of a single vector component. For example, temperature changes at a single point on different dates. Scenario: Let us see how the built-in dataset US Accidental deaths get represented in a line graph. It is a simple dataset with year, month, and the number of accidental deaths and is a good example of univariate data.

1
2
3
# Univaraite Data 
> print (USAccDeaths)
> plot(USAccDeaths type = 1)
r

Line graph

Multivariate

Multivariate: On the other hand, Multi (many) Variate (variables) are multiple variables in varied states. For example, the study of temperatures, humidity, and wind speed at different points and dates. Scenario: For this scenario lets us take iris dataset and we will use a scatter plot matrix.

1
2
# iris data 	
plot (iris)
r

Scatter Matrix

What Do You Want to Accomplish?

It is also important to keep focused on what you want to accomplish by the visualization. For example, you can make an area chart to study the relationship between variables and their behaviors and you can use a correlation plot to understand the relationship of variables based on the effect they have on each other. Similarly, you can compare values of the variables depending on criterion; for this, scatter plots and bubble charts can be considered. If the purpose is to study the composition of data, you can use pie charts or percentage charts.

In Conclusion

To summarize, the first step to visualization is to understand the data. The table below gives a quick summary of selecting a chart based on your data. If you are looking for simple, quick solutions use plot(), as it is intuitive and can handle common chart types.

UnivariateMultivariate
Line ChartsHeat Maps
Dotted LineStacked Columns charts
Candle StickBox plot
Bar ChartsBubble Charts
QualitativeQuantitative
Bubble ChartsWaterfall Charts
Heat MapsLine charts
Sankey PlotsHistograms
Pie ChartDensity Plot
Spatial MapsTime Series Graphs
CorrelateCompareCompositionDistribution
Scatter Plot / BubbleScatter PlotArea ChartsHistograms
Dot PlotBar Plot / CandlestickDensity PlotsArea Charts
Sankey PlotAxis Line /Dual LineWaterfall ChartsScatter Plot
Correlation PlotCohort DiagramsColumn ChartHeat Maps

12