Introduction

0

In this guide, we are going to learn about the fundamentals of plotting categorical data using Seaborn library.

By the end of this guide you will be able to implement the following concepts:

- Categorical scatter plots
- Categorical distribution plots
- Categorical estimate plots

In this guide we are going to use the following libraries:

**Syntax**

`1 2 3 4 5`

`# Importing necessary libraries import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt`

python

Consider the energy consumption of 600 houses for three consecutive days. If we need to plot a figure between the quantitative variable (energy consumed (in kWh)) and the qualitative variable (days (Mon, Tue, Wed)), then we can proceed with the `catplot`

method.
The `catplot`

method by default doesn't distinguish between the overlapping values. However, to observe the number of duplicate records, we can pass an argument `kind=swarm`

to the `catplot`

method so that it adds some jitter to the duplicate values and helps to distinguish them.

Let us understand both the scenarios using given code:

`1 2 3 4 5 6 7 8 9 10 11`

`# Generating random energy data np.random.seed(42) energy = np.linspace(50, 400, 200, dtype=int) np.random.shuffle(energy) energy = np.concatenate((energy, energy, energy)) np.random.shuffle(energy) # Building a dataset data = pd.DataFrame({'Days': ['Mon']*200 + ['Tue']*200 + ['Wed']*200, 'Energy': energy}) data.head()`

python

Days | Energy | |
---|---|---|

0 | Mon | 143 |

1 | Mon | 171 |

2 | Mon | 71 |

3 | Mon | 132 |

4 | Mon | 123 |

**Case I: Keeping the Duplicate Records**

`1 2 3 4 5 6 7 8 9 10`

`# Plotting sns.catplot(x='Days', y='Energy', data=data, aspect=2) plt.title('With duplicate records', weight='bold', fontsize=16) plt.show() # =============== # Alternative: stripplot # =============== # sns.stripplot(x='Days', y='Energy', data=data)`

python

**Case II: Separating the Duplicate Records**

`1 2 3 4 5 6 7 8 9`

`# Plotting sns.catplot(x='Days', y='Energy', data=data, kind='swarm', aspect=2) plt.title('Separating duplicate records', weight='bold', fontsize=16) plt.show() # =============== # Alternative: swarmplot # =============== # sns.swarmplot(x='Days', y='Energy', data=data)`

python

We can even create a specific visualization based on certain criteria. For instance, say out of 600 houses, 280 houses have more than five family members. We can separate such 280 houses in our visualization using the `hue`

argument as shown:

`1 2 3 4 5 6 7 8 9 10 11 12`

`# Allocating 280 houses randomly with more than five members members = np.array(['Yes']*280 + ['No']*320) np.random.seed(42) np.random.shuffle(members) # Adding data back to the dataset data['MembersMoreThanFive'] = members # Plotting sns.catplot(x='Days', y='Energy', data=data, kind='swarm', hue='MembersMoreThanFive', aspect=2) plt.title('Describing Hue parameter', weight='bold', fontsize=16) plt.show()`

python

We have learned the visualization of a quantitative variable against a qualitative variable in the previous section. Now, let us learn about how can we define the distribution of quantitative data spread across qualitative data. Such distributions are possible using either of the following plots: 1. Box Plot 2. Boxen Plot 3. Violin Plot

Let us learn to implement each one of them sequentially.

A box plot is used to know how a piece of quantitative data is spread across its 25th, 50th and 75th percentile. It provides information about the outliers, median as well as the minimum and maximum value within the data.
Let us take our previous dataset and visualize the box plot across each consecutive days. This can be achieved again by using the `catplot`

method as shown:

`1 2 3 4 5 6 7 8 9`

`# Plotting sns.catplot(x='Days', y='Energy', data=data, kind='box', aspect=2) plt.title('Box Plot', weight='bold', fontsize=16) plt.show() # ============== # Alternative: boxplot # ============== # sns.boxplot(x='Days', y='Energy', data=data)`

python

We can also generate the box plot based on a certain case. For instance, let us visualize the boxplot for each separating on the houses having more than five members.

`1 2 3 4`

`# Plotting sns.catplot(x='Days', y='Energy', data=data, kind='box', hue='MembersMoreThanFive', aspect=2) plt.title('Box Plot with Hue parameter', weight='bold', fontsize=16) plt.show()`

python

From the above figures, we can suggest that there are no outliers in the data and data has little skewness.

The boxen plot is very similar to the box plot in functioning. However, it presents more information on the percentiles and, hence, is suitable for large datasets.

Let us extend our previous dataset with 51,000 records and implement boxen plot on it:

`1 2 3 4 5 6 7 8 9 10 11 12`

`# Generating random energy data np.random.seed(42) energy = np.linspace(100, 100000, 51000) # Building a 51,000 records dataset data = pd.DataFrame({'Days': ['Mon']*17000 + ['Tue']*17000 + ['Wed']*17000, 'Energy': energy}) # Plotting sns.catplot(x='Days', y='Energy', data=data, kind='boxen', aspect=2) plt.title('Boxen Plot', weight='bold', fontsize=16) plt.show()`

python

As we can observe in the above figure that there is more information presented by the boxen plots through their percentiles as compared to a box plot. We can also use the `hue`

parameter along with the boxen plot as shown:

`1 2 3 4 5 6 7 8 9 10 11 12`

`# Allocating 30000 houses randomly with more than five members members = np.array(['Yes']*30000 + ['No']*21000) np.random.seed(42) np.random.shuffle(members) # Adding data back to the dataset data['MembersMoreThanFive'] = members # Plotting sns.catplot(x='Days', y='Energy', data=data, kind='boxen', hue='MembersMoreThanFive', aspect=2) plt.title('Boxen Plot with Hue parameter', weight='bold', fontsize=16) plt.show()`

python

The violin plot is a combination of a box plot and a kernel density estimation procedure. Therefore, we can obtain two different kinds of information through one plot.

Let us use the energy dataset of 51000 houses to represent the violin plot:

**Case I: Without the Hue Parameter**

`1 2 3 4`

`# Plotting sns.catplot(x='Days', y='Energy', data=data, kind='violin', aspect=2) plt.title('Violin Plot', weight='bold', fontsize=16) plt.show()`

python

In the center of each violin plot, you can observe the box plot and the left-right boundaries represent the symmetric kernel density estimation.

**Case II: With the Hue Parameter**

`1 2 3 4`

`# Plotting sns.catplot(x='Days', y='Energy', data=data, kind='violin', hue='MembersMoreThanFive', aspect=2) plt.title('Violin Plot with the Hue parameter', weight='bold', fontsize=16) plt.show()`

python

**Case III: With the Hue and the Split Parameters**

We can also combine two different `hue`

violin plots of similar day together using the `split`

argument as shown:

`1 2 3 4`

`# Plotting sns.catplot(x='Days', y='Energy', data=data, kind='violin', hue='MembersMoreThanFive', split=True, aspect=2) plt.title('Violin Plot with the Hue and Split parameters', weight='bold', fontsize=16) plt.show()`

python

If we need the estimate between the quantitative and qualitative data then instead of using the distribution plots we can use the estimate plots. Bar plot and point plot are two such estimate plots.

Let us implement the bar plot using the `catplot`

method on the energy dataset of 51,000 houses:

`1 2 3 4`

`# Plotting sns.catplot(x='Days', y='Energy', data=data, kind='bar', hue='MembersMoreThanFive', palette="ch:10.75") plt.title('Bar Plot', weight='bold', fontsize=16) plt.show()`

python

Similarly, a point plot connects the estimates of similar hue as shown on the same dataset:

`1 2 3 4 5`

`# Plotting sns.catplot(x='Days', y='Energy', data=data, kind='point', hue='MembersMoreThanFive', markers=["^", "o"], linestyles=["-", "--"]) plt.title('Point Plot', weight='bold', fontsize=16) plt.show()`

python

Here, both of the lines overlap each other suggesting that the estimates for both hues are the same.

In this guide, you have learned about the fundamental visualizations used for categorical data using scatter plots, distribution plots, and estimate plots.

To learn more about Seaborn, you can refer the following guides:

0

Test your skills. Learn something new. Get help. Repeat.

Start a FREE 10-day trial