Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Tableau Playbook - Word Cloud

Feb 26, 2020 • 15 Minute Read

Introduction

Tableau is the most popular interactive data visualization tool nowadays. It provides a wide variety of charts to explore and display your data easily and effectively. This series of guides, Tableau Playbook, introduces all kinds of common charts in Tableau. This particular guide focuses on word clouds.

In this guide, we'll learn about word clouds in the following steps:

  1. We will start with an example chart and discuss its characteristics.
  2. We'll use real-life datasets to build a word cloud step by step. Meanwhile, we will draw some conclusions from Tableau visualization.

Getting Started

Example

Here is a word cloud example from Kaggle. This word cloud displays the most common words in the top 1% most upvoted comments on the New York Times website. It uses size to represent the frequency of words, with larger word size indicating greater frequency.

What's even more attractive is that the topic is most upvoted comments, so the word cloud shape is customized as the upvote (thumbs up) shape.

Concept and Characteristics

The word cloud, called a tag cloud more broadly, is a novelty visual representation of text data. The basic unit is usually called a tag and in most cases displays as word. The tags form the shape of a cloud. That's the origin of the name.

The visual elements include size and color. Usually we use size to represent the frequency of a tag and color to represent categories or another measure. For some advanced word clouds, we can create a custom shape to convey more information or enhance visual appeal. Unfortunately, Tableau does not currently support custom shapes.

Alternative solutions for a word cloud are a treemap or sorted bar chart. The treemap is a chart recommended automatically by Tableau. The sorted bar chart is the old-school solution. Despite some aesthetic fatigue, ranking and comparison within sorted bar charts is more accurate than others.

It is important to understand the strengths and weaknesses of word clouds if you are going to use them.

Compared with alternatives, word clouds have the following strengths:

  • Scalability. Word clouds can hold a large number of tags. If you need to display dozens or hundreds of tags and highlight contributing members, you can consider a word cloud. Whereas in a treemap the label cannot be displayed for a small rectangle, the word cloud comes with labels.
  • Eye-catching visualization. Word clouds have good visual appeal, especially when rendered as a custom shape like the above example. They are suitable for an infographic or presentation and may attract more attention than the common chart.
  • More intuitive and conspicuous. With a quick glance, you can find the most frequent tags. This is because the text of a word cloud itself is a visual element and is more significant than in other graphs. It provides some kind of first insight.

Meanwhile, word clouds also have many weaknesses:

  • Difficult to make accurate comparisons. People are better at comparing the length or position than the area of text. Another problem is that there is no common baseline for the tags you want to compare. Sorted bar charts provide more quantitative and accurate comparisons. So word clouds have no place in serious business data analysis and visualization.
  • Representing relatively little information. Although a word cloud can support three dimensions or measures and a large number of tags, the main information users can extract is only a few most frequent tags. The smaller ones may be overlooked.
  • Messy arrangement. Tags are distributed based on the word cloud algorithm. They are out of order and crowded together. Tags belonging to the same category may lie far apart from each other. By contrast, sorted bar charts put data in descending order. A treemap is also basically arranged in order from big to small.

If you need further reading, here is a good article about word clouds.

Dataset

In this guide, we'll use the dataset Kickstarter Project Statistics from Kaggle Dataset. Thanks to Kickstarter and Kaggle for this dataset.

This dataset contains the top 4000 most backed projects ever on Kickstarter. You can download "most_backed.csv" on the home page. In this guide, we will analyze the popular keywords in project titles and blurbs.

Data Preparation

Before we create a word cloud, we need to do some data preprocessing with the help of external tools. Although we can do simple data processing in Tableau, the function is very limited and usually cannot meet our needs.

So why do we include this process? Because it's not simple data wrangling, but an important step in creating a word cloud.

Here we choose Python to demonstrate. You can also use other data processing tools like Excel or R.

To achieve our goal, we will use pandas, which is a powerful Python data analysis toolkit.

  1. First, we need to install the pandas library and import it.
      import pandas as pd
    
  1. Then we load the dataset "most_backed.csv".
      df = pd.read_csv("most_backed.csv")
    
  1. Do some data wrangling. Convert all text to lowercase. Remove numbers and punctuation. For blurb, we need to replace line breaks with space.
      df['title'] = df['title'].str.lower().str.replace('[0-9]|[^\w\s]', '')
df['blurb'] = df['blurb'].str.lower().str.replace('\\n', ' ').str.replace('[0-9]|[^\w\s]', '')
    
  1. Split title and blurb to words. Now that we've removed punctuation and line breaks, we can split text by space. Then we stack the whole column of split words together. For the title, we list all the words in one column. For the blurb, we list words and their count in two columns.
      df_title = df['title'].str.split(expand=True).stack().reset_index(name='Word')
df_blurb = df['blurb'].str.split(expand=True).stack().value_counts().rename_axis('Word').reset_index(name='Count')
    
  1. As we observe the data, we find that many high-frequency words are commonly used words without significant meaning, such as "the" or "about". They are useless for our analysis. They are called stop words. We need to filter them out. Here is a stop words list from xpo6.

As an alternative, we can also control these stop words in Tableau by linking field.

      df_title.head(100)

with open("stop-word-list.txt","r") as f:
    content = f.read()
    stop_words = content.split('\n')
    
  1. Next, remove stop words and single letters.
      df_title = df_title[~df_title['Word'].isin(stop_words)]
df_title = df_title[df_title['Word'].str.len() > 1]

df_blurb = df_blurb[~df_blurb['Word'].isin(stop_words)]
df_blurb = df_blurb[df_blurb['Word'].str.len() > 1]
    
  1. Keep only one or two columns and export the data.
      df_title.drop(columns=['level_0', 'level_1'], inplace=True)

df_title.to_csv('title.csv', index=False)
df_blurb.to_csv('blurb.csv', index=False)
    

If you want to skip this step, you can download the title.csv and blurb.csv data files directly.

Process

This process is inspired by posts from Towards Data Science and Clearly and Simply.

We have types of data input. We will demonstrate them both.

List of Words and Count

In this example, we connect to the data source "blurb.csv".

  1. When we drag our data into the view, Tableau will recommend a treemap by default.

    1. Drag Count into Marks - Label.
    2. Drag Word into Marks - Size.

    word cloud - words and count 1

  2. Before we change it into a word cloud, we need filter by count. Because there are so many words, if we don't filter, it will take a lot of time to generate a word cloud chart.

    1. Drag Word into Filters. A dialog will pop-up automatically. Choose All values and click Next.
    2. Edit the range to start at 60 by inputting or sliding.
    3. Right-click on the filter and check Show Filter.
  3. Now we can convert the marks type to Text. Tableau has built-in support for word clouds, which are automatically arranged into cloud shapes. But we cannot customize the shape.

    word cloud - words and count 2

  4. Assign colors to the words. We will color by Count. We can also color by Word. This would be more colorful but meaningless. If we have another dimension, such as text categories, it is a good choice to display it in color.

    1. Drag Count into Marks - Color.
    2. Expand the Color card and click the Edit Colors... button.
    3. Choose a diverging color like "Sunrise-Sunset Diverging" to show words differently.
    4. To reduce the number of colors, check Stepped Color and set Steps as 12.
  5. In the last step, let's polish this chart:

    1. Edit title to "Popular Words in Kickstarter's Blurb".
    2. Rename the color legend to "Word Count".

Here is the final chart:

List of All Words

Now we connect to Data Source "title.csv".

  1. In this dataset, we have to count the words by COUNT aggregating.

    1. Drag Word into Marks - Label.
    2. Drag Word into Marks - Size.
    3. Right-click on the second Word and choose Measure -> Count.

    word cloud - all words 1

  2. As before, we need to filter on "Word" before converting to a word cloud. This time we use a parameter.

    1. Right-click and choose Create Parameter...

      1. Name it "Word Count Lower Bound".
      2. Choose Integer as the Data type.
      3. Set Current Value to 20.
      4. Choose Range in Allowable values. Set Minimum to 1 and Maximum to 100.
    2. Right-click on this parameter and check Show Parameter Control.

    3. Drag Word into Filters. A dialog will pop-up automatically.

    4. Switch to the Condition tab. Choose By formula and input COUNT([Word]) > [Word Count Lower Bound].

  3. Tableau provides a native feature to create a word cloud. So we convert the marks type to Text.

  4. Color it by word count. Hold down the control key (command key in mac), which means make a copy, and drag CNT(Word) into Marks - Color.

    word cloud - all words 2

  5. Put on the finishing touches:

    1. Edit title to "Popular Words in Kickstarter's Title".
    2. Rename the color legend to "Word Count".

Analysis:

In these two examples, a glance at the word clouds tells us which words are the most popular. The sizes reflects their popularity. The colors also help us highlight the high-frequency words.

Although Tableau does not support custom shapes, these basic word clouds are also more intuitive and visually appealing than other charts.

Conclusion

In this guide, we have learned about one of the derived charts in Tableau: word cloud.

First, we introduced the concept and characteristics of a word cloud. Compared with alternative charts, we analyzed the pros and cons of word clouds. Then we learnedhow to preprocess the data with the help of external tools. Finally, we learned the process of creating word clouds from two formats of data.

You can download this example workbook, Word Cloud, from Tableau Public.

In conclusion, I have also drawn a mind map to help you organize and review the knowledge in this guide.

More Information

If you want to dive deeper into this topic or learn more comprehensively, there are many professional Tableau Training Classes on Pluralsight, such as Tableau Desktop Playbook: Building Common Chart Types.

I made a complete list of common Tableau charts serial guides, in case you are interested:

CategoriesGuides and Links
Bar ChartBar Chart, Stacked Bar Chart, Side-by-side Bar Chart, Histogram, Diverging Bar Chart
Text TableText Table, Highlight Table, Heat Map, Dot Plot
Line ChartLine Chart, Dual Axis Line Chart, Area Chart, Sparklines, Step Lines and Jump Lines
Standard ChartPie Chart, Tree Map, Scatter Plot, Box and Whisker Plot, Gannt Chart, Bullet Chart, Bubble Chart, Map
Derived ChartFunnel Chart, Waterfall Chart, Waffle Chart, Slope Chart, Bump Chart, Sankey Chart, Radar Chart, Connected Scatter Plot, Time Series, Word Cloud
Composite ChartLollipop Chart, Dumbbell Chart, Pareto Chart, Donut Chart, Radial Chart

I hope you enjoyed this guide. If you have any questions, you're welcome to contact me at recnac@foxmail.com.