Tableau is the most popular interactive data visualization tool, nowadays. It provides a wide variety of charts to explore your data easily and effectively. This series of guides - Tableau Playbook - will introduce all kinds of common charts in Tableau. And this guide will focus on the Pareto Chart.
In this guide, we will start with an example chart and introduce the concept and characteristics of it. By analyzing a real-life dataset: Top Baby Names in the US, we will learn to build a Pareto chart, step by step. Meanwhile, we will draw some conclusions from Tableau visualization.
Here is a Pareto chart example from Lean Manufacturing and Six Sigma Definitions. This example illustrates various types of medication errors. We can find out the first four medication errors account for 80% of the total errors. The percentage is about 33%, close to obeying the Pareto principle (80/20 rule).
Before we learn the Pareto Chart, we need to know what the Pareto principle is.
Pareto principle, also known as 80/20 rule, states that, for many events, roughly 80% of the effects come from 20% of the causes. It is named after Italian economist Vilfredo Pareto.
For example, in computer science, some rules like, "20% of the code has 80% of the errors", or "the hardest 20% of the code takes 80% of the time".
The definition of Pareto chart from Wikipedia:
A Pareto chart is a type of chart that contains both bars and a line graph, where individual values are represented in descending order by bars, and the cumulative total is represented by the line.
In fact, in a broad sense, the Pareto chart is not limited to bars. It can be extended to other visual elements, such as areas and dots.
Specifically, in Tableau, the Pareto chart is the kind of composite chart which visualizes the Pareto principle. It shows the cumulative percentage by the line, which we can call the Pareto curve. And it uses the dual axis technique to compare with the distribution, impact from largest to smallest. Based on that, we will calculate how many dimension items are contributing to what percentage of an overall measure by reference lines.
In this guide, we use the Top Baby Names in the US dataset. Thanks to the Social Security Administration for this dataset.
This dataset contains the most popular baby names in each state for each year from 1910-2012.
We will analyze whether the distributions of baby names obey the Pareto principle.
In this section, we will build a feature-rich Pareto chart. It is kind of complex to build this chart. The key process is using table calculations to compute the percent of total of the running total for both the measure and the count distinct of the dimension.
Inspired by this official video and this Pluralsight course.
Our first task is to build a Pareto curve. We start with a bar chart:
Next, we want to calculate the running percent of total "Occurences":
Then we need to refactor "Top Name" to display the running percent of the total as well:
Now we have completed the Pareto curve part. Usually, we need to show the distribution for comparison. Here we use a bar chart, which is combined by using the dual axis technique.
Add reference lines to illustrate the 80/20 rule. We fix "Occurences" at 80% and calculate what percentage of "Top Name" makes up 80%.
Switch to Analytics tab and drag Constant Line into Table - "SUM(Occurences)".
Edit the value to 0.8, which means constant as 80%.
To consistent with Pareto curve, edit this reference line to orange and dashed.
To calculate the "Top Name" percentage, we need to Create Calculated Field "Pareto Parameter". The formula is as follows:
1WINDOW_MIN(
2IF RUNNING_SUM(SUM([Occurences])) / TOTAL(SUM([Occurences])) >= 0.8
3THEN RUNNING_SUM(COUNTD([Top Name])) / TOTAL(COUNTD([Top Name]))
4END)
Drag "Pareto Parameter" to Marks - Detail.
Right-click on "Pareto Parameter" in Detail and choose Compute Using -> Top Name.
Now we can add the dynamic vertical reference line. Right-click on the x-axis and click Add Reference Line.
Right-click on this reference line and further format it.
Finally, to make it more powerful and to utilize its dynamic computation, we add some Filters.
In the last step, let's polish this chart:
Here is the final chart:
Analysis:
To figure out whether the occurrences of baby names satisfy the Pareto principle, we need to prepare a good sample of data:
In that way, we will notice the percentage of "Top Name" exactly matches 20% from 2000 to 2012. Even if we adjust the "Year", "State", and "Gender" slightly, the result is close to 20%. This means that we can conclude that occurrences of baby names in the US obey the 80/20 rule.
In this guide, we have learned about one of the composite charts in Tableau - Pareto Chart.
First, we introduced the concept and characteristics of a Pareto chart. And then we learned how to build a feature-rich Pareto chart, including Pareto curve, dual axis with a bar chart, dynamic reference lines, and filters.
You can download this example workbook Composite Charts from Tableau Public.
In conclusion, I have drawn a mind map to help you organize and review the knowledge in this guide.
I hope you enjoyed it. If you have any questions, you're welcome to contact me [email protected]
If you want to dive deeper into the topic or learn more comprehensively, there are many professional Tableau Training Classes on Pluralsight, such as Tableau Desktop Playbook: Building Common Chart Types.
I made a complete list of my common Tableau charts serial guides, in case you are interested:
Categories | Guides and Links |
---|---|
Bar Chart | Bar Chart, Stacked Bar Chart, Side-by-side Bar Chart, Histogram, Diverging Bar Chart |
Text Table | Text Table, Highlight Table, Heat Map, Dot Plot |
Line Chart | Line Chart, Dual Axis Line Chart, Area Chart, Sparklines, Step Lines and Jump Lines |
Standard Chart | Pie Chart |
Derived Chart | Funnel Chart, Waffle Chart |
Composite Chart | Lollipop Chart, Dumbbell Chart, Pareto Chart, Donut Chart |