In Part 1 of this series, we learned to create basic charts using PivotTable data. In this guide, we will continue where we left off and learn to build the following charts:
A histogram chart organizes data points into ranges and looks similar to a bar graph. It's easy to interpret visually because it groups many data points into logical ranges or bins.
You can control the number of bins represented in a histogram. When you pass data directly to any statistical software (Excel, R, MATLAB, etc.), they create a defined number of bins based on how the data is distributed. However, we can always control the final output. Let's learn with an example.
Create a PivotTable with Program under the Rows section and Misc Charges under the Values section. This will result in the following PivotTable:
|Row Labels||Sum of Misc Charges|
If you try to build a histogram on the numerical column, Excel will show the following message:
You can't create this chart type with data inside a PivotTable. Please select a different chart type, or copy the data outside the PivotTable.
So we copy the values 43000, 19000, 6000, and 12000 in a separate column and select them. Then click on Insert Statistics Chart icon under Charts section and select the histogram icon, as shown below:
Right after you click on the Histogram icon, you will get the following chart (the chart title has been renamed Misc Charges):
To modify the range of bins, click on Chart Elements > Axes > More Axis Options..., as shown below:
This opens up the Format Axis sidebar. To change the bin size, you can either go with the Number of bins or Bin width. Here, we choose 6000 as the parameter for Bin width.
Once you enter the value, press Enter, and the changes will reflect in the chart, as shown below:
As you can observe, this time there are only two values that lie in the range of 6000, 12000. Moving ahead, in our current case we have only four values and three bars, which makes it easy to analyze the skewness (to be discussed thoroughly in the next section) in the data. However, when we have hundreds of bars in a histogram, it is recommended to draw a Pareto Line* over the bars, which can give a general idea of how the data is distributed. To draw a histogram with a Pareto Line, choose the following icon:
A box plot (or box-and-whisker plot) helps you to visualize the distribution of a numerical data. A box plot consists of a box (which describes different quartiles), whiskers (extending to show the rest of the distribution) and several points (implying outliers in the data).
Let's quickly create box plot of Misc charges values, which are copied in a different column than the PivotTable table.
The key takeaways from the above plot are:
So far in the guide, you have built charts where there is only one numerical column (Misc Charges). Now, we will consider a case where we have two numerical columns (Misc Charges and Tuition Fee) in our PivotTable, as shown:
|Row Labels||Sum of Misc Charges||Sum of Tuition Fee|
When you copy the values of these two columns (Sum of Misc Charges and Sum of Tuition Fee) in separate columns and create a box plot on them, you may receive a plot like this:
Can you see the problem with the above plot?
Since the range of Misc Charges is way smaller than that of Tuition Fee, we cannot observe the actual Misc Charges box plot. To overcome this problem, we can go for Min-Max Normalization.
X_i = (X_i - X_min)/(X_max - X_min)
If you have a dummy data with values
5, 2, 8, 6, 12, then the normalized value of 5 is calculated as
(5-2)/(12-2) = 0.3.
The normalized values of the Misc Charges and Tuition Fee along with the final box plot is shown below:
|Normalized Misc Charges||Normalized Tuition Fee|
|(43000 - 6000) / (43000 - 6000) = 1||(1223000 - 68000) / (1223000 - 68000) = 1|
|(19000 - 6000) / (43000 - 6000) = 0.351351||(291555 - 68000) / (1223000 - 68000) = 0.193554|
|(6000 - 6000/ (43000 - 6000) = 0||(68000 - 68000/ (1223000 - 68000) = 0|
|(12000 - 6000/ (43000 - 6000) = 0.162162||(86000 - 68000/ (1223000 - 68000) = 0.015584|
As we can observe, normalization has made it easy to understand the data distribution in each of the numerical columns.
Note: A grouped box plot can alter the actual data distribution a bit. It is suggested to use grouped box plot only for a general overview.
When you need to show relative proportion in various categories, you can use either a Treemap plot (based on rectangles) or a Sunburst plot (based on rings). These plots are pretty self-explanatory.
To learn how to create them in Excel, consider the following PivotTable:
|Row Labels||Sum of Tuition Fee|
|California Institute of Technology||80000|
|Indian Institute of Technology||40000|
|London Business School||72000|
|London School of Economics||85000|
|Massachusetts Institute of Technology||99555|
|The University of Tokyo||68000|
|University of Cambridge||86000|
First, copy the university names along with their tuition fees in two separate columns. Select the values and click on the Treemap icon as shown below:
This will create the following Treemap:
As you can clearly observe, the rectangle representing Harvard University is largest in size, making it the most expensive university among these 10 universities. The rectangles follow a hierarchy, so after Harvard University, the next most expensive university is Massachusetts Institute of Technology, followed by University of Cambridge, and so on.
Similar to a Treemap, a Sunburst plot communicates the same information, but visualizes the items in rings as shown:
Here again, hierarchy has been followed based on Tuition Fee.
So far, we have covered a lot of charts in this series of two guides. But have you noticed that all the charts that we have created aren't mixed with other charts to form a combo?
In this section, you will learn how to mix multiple charts at visualize them on a same plot. To start, we consider the given PivotTable:
|Row Labels||Sum of Tuition Fee||Sum of Accommodation Fee||Sum of Misc Charges|
|California Institute of Technology||80000||20000||5000|
|Indian Institute of Technology||40000||15000||2000|
|London Business School||72000||20000||8000|
|London School of Economics||85000||18000||9000|
|Massachusetts Institute of Technology||99555||34000||9000|
|The University of Tokyo||68000||55000||6000|
|University of Cambridge||86000||56000||12000|
If you try to build a column chart on the above PivotTable, you get this:
You can observe how difficult it is to analyze the smaller values due to outliers in our data. Again, you can solve this problem either by normalizing the data, using stacked charts, or using a Combo chart.
What if you created column charts for only Accommodation Fee and Misc Charges, while creating a line chart for Tuition Fee?
Let us see what that looks like!
Copy all three numerical columns of data in three separate columns, select these values, click the Combo icon and select Create Custom Combo Chart.... This opens up the following dialog box:
You need to make sure that Tuition Fee is plotted on the secondary axis (check the tick) as a line chart. Therefore, select a Line chart against Tuition Fee. Next, we leave Accommodation Fee untouched and change Misc Charges to a Column chart and remove the tick from the secondary axis in-front of it. The final modification should look like this:
Once done, click OK and you will get a Combo Chart as shown:
As you can observe, the Combo chart has clearly identified values corresponding to each of the three data columns.
In this series of two guides, you have learned to create various types of PivotCharts, their use-cases, and their implementations.