Introduction

1

Tableau is the most popular interactive data visualization tool nowadays. It provides a wide variety of charts to explore your data easily and effectively. This series of guides—Tableau Playbook—will introduce all kinds of common charts in Tableau. And this guide will focus on the *scatter plot*.

In this guide, we will learn about the scatter plot in the following steps:

- We will start with an example chart to introduce the concept and characteristics of the scatter plot.
- By analyzing real-life datasets, we will learn to build a scatter plot step by step. We will then optimize and polish the chart with advanced features.

Here is a scatter plot example of the 2018 Developer Survey results from Stack Overflow. It shows the relationship between salary and experience by programming languages.

We can mine a lot of information from this scatter plot:

- As a whole, we use the trend line to fit the linear relationship between experience and salary.
- From a horizontal perspective, we can compare the age distribution of the developers who work in various languages. From a vertical perspective, we can compare the median salary distribution for developers who work in each language.
- Additional visual elements, such as size and color, allow a scatter plot is able to convey more information. In this example, the size of the circles intuitively expresses the popularity of language.
- With the help of the scatter plot, we can dig out useful information. For example, developers in Go, Clojure, and F# are being paid more even given how much experience they have. Developers using languages below the line, like PHP and Visual Basic 6, however, are paid less, even given years of experience.

According to the wikipedia entry about the scatter plot:

A scatter plot is a type of plot using Cartesian coordinates to display values for typically two variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed.

Scatter plots are commonly used in statistical analysis. They are an extremely effective way to compare multiple measures for a dimension with many distinct values. The basic case is to compare two measures with x and y axes. More measures can be added by Tableau's visual elements, such as size and color.

It is important to understand the strengths and weaknesses of a scatter plot if you are going to use them.

Scatter plots have the following strengths:

**Scalability - can hold a large number of points**: Scatter plots give us an option to display a lot of data in a small area with relatively low confusion rates.**Analyze correlation**: A typical use of a scatter plot is to determine whether two measures are correlated. Tableau provides statistical variables such as the P-value and R-squared. But it's important to note that we need to treat correlation objectively. When two variables are correlated, it does not mean that one variable caused the other.**Observe data intuitively**: In a scatter plot, you can visually observe outliers, data ranges, or specified areas. What's more, with the interactive operation provided by Tableau, we can further analyze these points in detail.

The biggest disadvantage of a scatter plot is the possibility of *over plotting*. While it is able to hold plenty of data, over plotting may become a problem when a scatter plot is dense. We can reduce this visual discomfort by adjusting the opacity or highlighting.

In this guide, we'll use the dataset Boston Housing from Kaggle Dataset. Thanks to the U.S. Census Service and Kaggle for this dataset. The data was collected in 1978, and each of the 506 entries represents aggregated data about 14 features for homes from various suburbs in Boston, Massachusetts.

In this guide, we will analyze how the following factors affect house price:

- MEDV: Median value of owner-occupied homes in $1000's
- RM: average number of rooms per dwelling
- CRIM: per capita crime rate by town
- LSTAT: % lower status of the population

Let's start by creating a basic scatter plot step by step.

- Before drawing, we need to do some data preprocessing with the help of external tools such as Excel. To display each row as a point, we need to add
`ID`

to identify. The easiest way is to add the**ID**column in Excel. At present, it is hard to create unique identifiers in Tableau. If you insist on that, please refer to this post.

We can generate a basic chart automatically by using

**Show Me**. This is the easiest way to build a scatter plot. Click on**Show Me**and you will see these instructions:For scatter plots, try 0 or more Dimensions, 2 to 4 Measures.

In this example, we need two measures, `RM`

and `MEDV`

. Hold down the **Control** key (**Command** key on Mac) while clicking to multiple select `RM`

and `MEDV`

, then choose **scatter plots** in **Show Me**.

Now we notice there is only one point in the chart. That is because all the records are aggregated together. Here we can split the data by

`ID`

, which we created before.- Convert
`ID`

into**Dimension**. - Drag
`ID`

into**Marks**-**Detail**. - Switch to
**Entire View**for a nicer visualization.

- Convert

- On the top of the scatter plot, sixteen data points have a 1MEDV1 value of 50.0. They are outliers that have been clamped by the upper bound. For a more accurate analysis, we should remove these outliers. Multi-select them and click
**Exclude**in the pop-up dialog. Tableau will exclude them in the**Filters**.

For a more attractive chart, edit visual elements such as shape and color:

- Expand the
**Shape**card in**Marks**and replace the empty circle with the solid circle or any other shape that makes sense for your readers. - To reduce the impact of the overlay, expand the
**Color**card in**Marks**and slide the**Opacity**to semitransparent.

- Expand the

Add a trend line to identify the correlation between

`RM`

and`MEDV`

.- Right-click on the chart and choose
**Trend Lines**->**Show Trend Lines**. - Right-click on the trend line and click
**Edit Trend Lines...** - Choose
**Linear**as the Model type. - Check
**Show Confidence Bands**.

- Right-click on the chart and choose

In the last step, let's polish this chart:

- Edit title to "Relationship between Room Number and House Price".
- Rename the x-axis as "Room Number" and y-axis as "House Price".

**Analysis**:

In this basic scatter plot, we analyze the correlation between number of rooms and house price. We simulate the relationship by the linear model. From the statistical variables provided by Tableau, we can see that P-value is less than 0.001 and R-Squared is 0.471. This indicates their linear correlation is relatively high.

When focusing on the points, we can dig out some other information. We find out the average number of rooms is between 5.5 and 6.8 and house price is between 15,000 and 25,000. We can also clearly distinguish the outliers and further analyze the detail information of them.

In this section, we will add more advanced features to enhance the scatter plot.

First, let's build a scatter plot as we did previously.

- This time we'll build it manually. Drag
`LSTAT`

into**Columns Shelf**and`MEDV`

into**Rows Shelf**. - Drag
`ID`

into**Marks**-**Details**. - Right-click and choose
**Hide Indicator**for nulls values. - Multi-select the top outliers and click
**Exclude**in the pop-up dialog. - Switch to
**Entire View**for a better view.

- This time we'll build it manually. Drag

Add more visual elements to convey information. Here we show measure

`CRIM`

by size.- Drag
`CRIM`

into**Marks**-**Size**. - Adjust size by expanding the
**Size**card or the size legend on the right side.

- Drag

- The clustering technique is useful to analyze the characteristics of a scatter plot. Tableau has built-in clustering algorithms, such as k-mean. Let's try out Tableau's clustering capabilities to look for the common properties of points.

- Switch to the
**Analytics**pane. Drag**Cluster**into the view and drop it on the**Create Clusters**box that appears. We can see it is calculated by the three measures we created before. We remove

`CRIM`

and see what's going on with only two axis variables.Notice Tableau found four clusters. Each cluster reflects a potential class of houses from different prices and

`LSTAT`

ranges. We can dig more information from these clusters, but we will stop here and focus on the scatter plot.

Let's add another measure

`RM`

as color.- Drag
`RM`

into**Marks**-**Color**. - The default color configuration is not good enough. Let's make it better. Click the
**inverted triangle**in color legend, then choose**Edit Colors...** - To distinguish the color more clearly, we change the single color to diverging colors. Choose
**Red-Green-Gold Diverging**in**Palette**. - In order to group the points by number of rooms, we check
**Stepped Color**and set**Steps**to 5.

- Drag

Add more quantitative indicators to the scatter plot:

- Switch to the
**Analytics**tab, and drag**Average Line**into**Table**-**Cell**of`SUM(LSTAT)`

and`SUM(MEDV)`

. Left click on them and change the type to**Median**. - Add a trend line as we did before. The only difference is that this time, we simulate the model type as
**Logarithmic**.

- Switch to the

Put on the finishing touches:

- Edit title to "Factors which affect House Price in Boston".
- Rename the x-axis "Lower Status of Population" and y-axis "House Price".
- Rename the size legend "Crime Rate" and color legend "Room Number".
- If you think these grid lines are too distracting, you can remove them to make the chart cleaner: navigate to
**Format**->**Lines...**and set**Grid Lines**to**None**in**Lines**.

**Analysis**:

With these advanced features, the scatter plot becomes more powerful. With the color elements, we can see that generally the more rooms a house has, the higher the house price will be. With the size elements, we find out that the higher the crime rate is, the lower the house price will be.

In this guide, we have learned about one of the standard charts in Tableau: the scatter plot.

First, we introduced the concept and characteristics of a scatter plot. And then we learned the basic process to create a scatter plot. Finally, we enhanced the scatter plot with clustering, size, color, and quantitative indicators.

You can download an example workbook of Standard Charts from Tableau Public.

In conclusion, I have drawn a mind map to help you organize and review the knowledge in this guide.

I hope you enjoyed it. If you have any questions, you're welcome to contact me at [email protected]

If you want to dive deeper into this topic, there are many professional Tableau Training Classes on **Pluralsight**, such as Tableau Desktop Playbook: Building Common Chart Types.

Here is a complete list of guides in this series about common Tableau charts:

Categories | Guides and Links |
---|---|

Bar Chart | Bar Chart, Stacked Bar Chart, Side-by-side Bar Chart, Histogram, Diverging Bar Chart |

Text Table | Text Table, Highlight Table, Heat Map, Dot Plot |

Line Chart | Line Chart, Dual Axis Line Chart, Area Chart, Sparklines, Step Lines and Jump Lines |

Standard Chart | Pie Chart, Tree Map, Scatter Plot, Box and Whisker Plot, Gannt Chart, Bullet Chart, Bubble Chart, Map |

Derived Chart | Funnel Chart, Waterfall Chart, Waffle Chart, Slope Chart, Bump Chart, Sankey Chart, Radar Chart, Connected Scatter Plot, Time Series, Word Cloud |

Composite Chart | Lollipop Chart, Dumbbell Chart, Pareto Chart, Donut Chart, Radial Chart, Burn Down Chart |

1