Author avatar

Chhaya Wagmi

Visualizing Data with PivotChart: Part 2

Chhaya Wagmi

  • Mar 1, 2020
  • 12 Min read
  • 468 Views
  • Mar 1, 2020
  • 12 Min read
  • 468 Views
Business Professional
Microsoft Excel

Introduction

In Part 1 of this series, we learned to create basic charts using PivotTable data. In this guide, we will continue where we left off and learn to build the following charts:

  1. Histogram
  2. Box and Whisker
  3. Treemap and Sunburst
  4. Combo

1. Histogram

A histogram chart organizes data points into ranges and looks similar to a bar graph. It's easy to interpret visually because it groups many data points into logical ranges or bins.

You can control the number of bins represented in a histogram. When you pass data directly to any statistical software (Excel, R, MATLAB, etc.), they create a defined number of bins based on how the data is distributed. However, we can always control the final output. Let's learn with an example.

Create a PivotTable with Program under the Rows section and Misc Charges under the Values section. This will result in the following PivotTable:

Row LabelsSum of Misc Charges
Master's (Biz)43000
Master's (Engg)19000
Ph.D.6000
Ph.D.12000
Grand Total80000

If you try to build a histogram on the numerical column, Excel will show the following message:

You can't create this chart type with data inside a PivotTable. Please select a different chart type, or copy the data outside the PivotTable.

So we copy the values 43000, 19000, 6000, and 12000 in a separate column and select them. Then click on Insert Statistics Chart icon under Charts section and select the histogram icon, as shown below:

Imgur

Right after you click on the Histogram icon, you will get the following chart (the chart title has been renamed Misc Charges):

Imgur

You may observe that Excel has automatically created two bins with ranges 6000, 42000 and 42000, 78000. Here the first bin has a bar with a value of three, meaning that there are three items in this range. The next bin has a value of one, which means that it consists of only one value in its range.

To modify the range of bins, click on Chart Elements > Axes > More Axis Options..., as shown below:

Imgur

This opens up the Format Axis sidebar. To change the bin size, you can either go with the Number of bins or Bin width. Here, we choose 6000 as the parameter for Bin width.

Imgur

Once you enter the value, press Enter, and the changes will reflect in the chart, as shown below:

Imgur

As you can observe, this time there are only two values that lie in the range of 6000, 12000. Moving ahead, in our current case we have only four values and three bars, which makes it easy to analyze the skewness (to be discussed thoroughly in the next section) in the data. However, when we have hundreds of bars in a histogram, it is recommended to draw a Pareto Line* over the bars, which can give a general idea of how the data is distributed. To draw a histogram with a Pareto Line, choose the following icon:

Imgur

2. Box and Whisker

A box plot (or box-and-whisker plot) helps you to visualize the distribution of a numerical data. A box plot consists of a box (which describes different quartiles), whiskers (extending to show the rest of the distribution) and several points (implying outliers in the data).

Let's quickly create box plot of Misc charges values, which are copied in a different column than the PivotTable table.

Imgur

The key takeaways from the above plot are:

  • The data is skewed towards higher positive numbers.
  • The median lies around 15000.
  • There are no outliers in the data.

Working with Multiple Box Plots

So far in the guide, you have built charts where there is only one numerical column (Misc Charges). Now, we will consider a case where we have two numerical columns (Misc Charges and Tuition Fee) in our PivotTable, as shown:

Row LabelsSum of Misc ChargesSum of Tuition Fee
Master's (Biz)430001223000
Master's (Engg)19000291555
Ph.D.600068000
Ph.D.1200086000
Grand Total800001668555

When you copy the values of these two columns (Sum of Misc Charges and Sum of Tuition Fee) in separate columns and create a box plot on them, you may receive a plot like this:

Imgur

Can you see the problem with the above plot?

Since the range of Misc Charges is way smaller than that of Tuition Fee, we cannot observe the actual Misc Charges box plot. To overcome this problem, we can go for Min-Max Normalization.

A Min-Max Normalization always brings the result into a defined range 0, 1. So if you normalize both of these columns and then create their box plot, you will have both of these box plots in the range of 0, 1.

Mathematically,

X_i = (X_i - X_min)/(X_max - X_min)

If you have a dummy data with values 5, 2, 8, 6, 12, then the normalized value of 5 is calculated as (5-2)/(12-2) = 0.3.

The normalized values of the Misc Charges and Tuition Fee along with the final box plot is shown below:

Normalized Misc ChargesNormalized Tuition Fee
(43000 - 6000) / (43000 - 6000) = 1(1223000 - 68000) / (1223000 - 68000) = 1
(19000 - 6000) / (43000 - 6000) = 0.351351(291555 - 68000) / (1223000 - 68000) = 0.193554
(6000 - 6000/ (43000 - 6000) = 0(68000 - 68000/ (1223000 - 68000) = 0
(12000 - 6000/ (43000 - 6000) = 0.162162(86000 - 68000/ (1223000 - 68000) = 0.015584

Imgur

As we can observe, normalization has made it easy to understand the data distribution in each of the numerical columns.

Note: A grouped box plot can alter the actual data distribution a bit. It is suggested to use grouped box plot only for a general overview.

3. Treemap and Sunburst

When you need to show relative proportion in various categories, you can use either a Treemap plot (based on rectangles) or a Sunburst plot (based on rings). These plots are pretty self-explanatory.

To learn how to create them in Excel, consider the following PivotTable:

Row LabelsSum of Tuition Fee
Brown University72000
California Institute of Technology80000
Harvard University1010000
Indian Institute of Technology40000
Kyoto University56000
London Business School72000
London School of Economics85000
Massachusetts Institute of Technology99555
The University of Tokyo68000
University of Cambridge86000
Grand Total1668555

First, copy the university names along with their tuition fees in two separate columns. Select the values and click on the Treemap icon as shown below:

Imgur

This will create the following Treemap:

Imgur

As you can clearly observe, the rectangle representing Harvard University is largest in size, making it the most expensive university among these 10 universities. The rectangles follow a hierarchy, so after Harvard University, the next most expensive university is Massachusetts Institute of Technology, followed by University of Cambridge, and so on.

Similar to a Treemap, a Sunburst plot communicates the same information, but visualizes the items in rings as shown:

Imgur

Here again, hierarchy has been followed based on Tuition Fee.

4. Combo

So far, we have covered a lot of charts in this series of two guides. But have you noticed that all the charts that we have created aren't mixed with other charts to form a combo?

In this section, you will learn how to mix multiple charts at visualize them on a same plot. To start, we consider the given PivotTable:

Row LabelsSum of Tuition FeeSum of Accommodation FeeSum of Misc Charges
Brown University72000220003000
California Institute of Technology80000200005000
Harvard University10100004200018000
Indian Institute of Technology40000150002000
Kyoto University56000320008000
London Business School72000200008000
London School of Economics85000180009000
Massachusetts Institute of Technology99555340009000
The University of Tokyo68000550006000
University of Cambridge860005600012000
Grand Total166855531400080000

If you try to build a column chart on the above PivotTable, you get this:

Imgur

You can observe how difficult it is to analyze the smaller values due to outliers in our data. Again, you can solve this problem either by normalizing the data, using stacked charts, or using a Combo chart.

What if you created column charts for only Accommodation Fee and Misc Charges, while creating a line chart for Tuition Fee?

Let us see what that looks like!

Copy all three numerical columns of data in three separate columns, select these values, click the Combo icon and select Create Custom Combo Chart.... This opens up the following dialog box:

Imgur

You need to make sure that Tuition Fee is plotted on the secondary axis (check the tick) as a line chart. Therefore, select a Line chart against Tuition Fee. Next, we leave Accommodation Fee untouched and change Misc Charges to a Column chart and remove the tick from the secondary axis in-front of it. The final modification should look like this:

Imgur

Once done, click OK and you will get a Combo Chart as shown:

Imgur

As you can observe, the Combo chart has clearly identified values corresponding to each of the three data columns.

Conclusion

In this series of two guides, you have learned to create various types of PivotCharts, their use-cases, and their implementations.

2