Tableau is the most popular interactive data visualization tool nowadays. It provides a wide variety of charts to explore your data easily and effectively. This series of guides—Tableau Playbook—will introduce all kinds of common charts in Tableau. And this guide will focus on the scatter plot.
In this guide, we will learn about the scatter plot in the following steps:
Here is a scatter plot example of the 2018 Developer Survey results from Stack Overflow. It shows the relationship between salary and experience by programming languages.
We can mine a lot of information from this scatter plot:
According to the wikipedia entry about the scatter plot:
A scatter plot is a type of plot using Cartesian coordinates to display values for typically two variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed.
Scatter plots are commonly used in statistical analysis. They are an extremely effective way to compare multiple measures for a dimension with many distinct values. The basic case is to compare two measures with x and y axes. More measures can be added by Tableau's visual elements, such as size and color.
It is important to understand the strengths and weaknesses of a scatter plot if you are going to use them.
Scatter plots have the following strengths:
The biggest disadvantage of a scatter plot is the possibility of over plotting. While it is able to hold plenty of data, over plotting may become a problem when a scatter plot is dense. We can reduce this visual discomfort by adjusting the opacity or highlighting.
In this guide, we'll use the dataset Boston Housing from Kaggle Dataset. Thanks to the U.S. Census Service and Kaggle for this dataset. The data was collected in 1978, and each of the 506 entries represents aggregated data about 14 features for homes from various suburbs in Boston, Massachusetts.
In this guide, we will analyze how the following factors affect house price:
Let's start by creating a basic scatter plot step by step.
ID
to identify. The easiest way is to add the ID column in Excel. At present, it is hard to create unique identifiers in Tableau. If you insist on that, please refer to this post.We can generate a basic chart automatically by using Show Me. This is the easiest way to build a scatter plot. Click on Show Me and you will see these instructions:
For scatter plots, try 0 or more Dimensions, 2 to 4 Measures.
In this example, we need two measures, RM
and MEDV
. Hold down the Control key (Command key on Mac) while clicking to multiple select RM
and MEDV
, then choose scatter plots in Show Me.
Now we notice there is only one point in the chart. That is because all the records are aggregated together. Here we can split the data by ID
, which we created before.
ID
into Dimension.ID
into Marks - Detail.For a more attractive chart, edit visual elements such as shape and color:
Add a trend line to identify the correlation between RM
and MEDV
.
In the last step, let's polish this chart:
Analysis:
In this basic scatter plot, we analyze the correlation between number of rooms and house price. We simulate the relationship by the linear model. From the statistical variables provided by Tableau, we can see that P-value is less than 0.001 and R-Squared is 0.471. This indicates their linear correlation is relatively high.
When focusing on the points, we can dig out some other information. We find out the average number of rooms is between 5.5 and 6.8 and house price is between 15,000 and 25,000. We can also clearly distinguish the outliers and further analyze the detail information of them.
In this section, we will add more advanced features to enhance the scatter plot.
First, let's build a scatter plot as we did previously.
LSTAT
into Columns Shelf and MEDV
into Rows Shelf.ID
into Marks - Details.
Add more visual elements to convey information. Here we show measure CRIM
by size.
CRIM
into Marks - Size.
We can see it is calculated by the three measures we created before. We remove CRIM
and see what's going on with only two axis variables.
Notice Tableau found four clusters. Each cluster reflects a potential class of houses from different prices and LSTAT
ranges. We can dig more information from these clusters, but we will stop here and focus on the scatter plot.
Let's add another measure RM
as color.
RM
into Marks - Color.
Add more quantitative indicators to the scatter plot:
SUM(LSTAT)
and SUM(MEDV)
. Left click on them and change the type to Median.Put on the finishing touches:
Analysis:
With these advanced features, the scatter plot becomes more powerful. With the color elements, we can see that generally the more rooms a house has, the higher the house price will be. With the size elements, we find out that the higher the crime rate is, the lower the house price will be.
In this guide, we have learned about one of the standard charts in Tableau: the scatter plot.
First, we introduced the concept and characteristics of a scatter plot. And then we learned the basic process to create a scatter plot. Finally, we enhanced the scatter plot with clustering, size, color, and quantitative indicators.
You can download an example workbook of Standard Charts from Tableau Public.
In conclusion, I have drawn a mind map to help you organize and review the knowledge in this guide.
I hope you enjoyed it. If you have any questions, you're welcome to contact me at [email protected].
If you want to dive deeper into this topic, there are many professional Tableau Training Classes on Pluralsight, such as Tableau Desktop Playbook: Building Common Chart Types.
Here is a complete list of guides in this series about common Tableau charts:
Categories | Guides and Links |
---|---|
Bar Chart | Bar Chart, Stacked Bar Chart, Side-by-side Bar Chart, Histogram, Diverging Bar Chart |
Text Table | Text Table, Highlight Table, Heat Map, Dot Plot |
Line Chart | Line Chart, Dual Axis Line Chart, Area Chart, Sparklines, Step Lines and Jump Lines |
Standard Chart | Pie Chart, Tree Map, Scatter Plot, Box and Whisker Plot, Gannt Chart, Bullet Chart, Bubble Chart, Map |
Derived Chart | Funnel Chart, Waterfall Chart, Waffle Chart, Slope Chart, Bump Chart, Sankey Chart, Radar Chart, Connected Scatter Plot, Time Series, Word Cloud |
Composite Chart | Lollipop Chart, Dumbbell Chart, Pareto Chart, Donut Chart, Radial Chart, Burn Down Chart |