- Lab
- Data

Statistical Analysis with Matplotlib Hands-on Practice
In this lab, you'll use Jupyter Notebook to learn data analysis and visualization. Starting with generating random numbers using NumPy, you'll create histograms and boxplots with Matplotlib. The tasks include initializing random generators, creating various data distributions, and visualizing them through histograms, boxplots, violin plots, and subplot mosaics, enhancing your statistical analysis skills.

Path Info
Table of Contents
-
Challenge
Generating Random Numbers and Creating Histograms
Jupyter Guide
To get started, open the file on the right entitled "Step 1...". You'll complete each task for Step 1 in that Jupyter Notebook file. Remember, you must run the cells
(Ctrl/Cmd(β) + Enter)
for each task before moving onto the next task in the Jupyter Notebook. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.
Generating Random Numbers and Creating Histograms
To review the concepts covered in this step, please refer to the Exploring Variable Distributions with Plots module of the Statistical Analysis with Matplotlib course.
Understanding how to generate random numbers and create histograms is important because these skills form the foundation of data analysis and visualization. They allow us to simulate data and understand the distribution of data respectively.
In this step, we will practice generating random numbers using NumPy and creating histograms using Matplotlib. The goal is to familiarize ourselves with the process of generating data and visualizing its distribution. We will use the
numpy.random.default_rng
object to generate the data and thematplotlib.pyplot.hist
function to create the histogram. We'll look at a few different distributions.
Task 1.1: Importing Required Libraries
Import the required libraries for this lab. We will need NumPy for generating random numbers and Matplotlib for creating histograms.
π Hint
Use the
import
keyword to import NumPy and Matplotlib. Remember to use thepyplot
module from Matplotlib and use aliases for simplicity.π Solution
import numpy as np import matplotlib.pyplot as plt
Task 1.2: Initialize Random Number Generator
Initialize the random number generator using the
numpy.random.default_rng
function.π Hint
Call the
numpy.random.default_rng
function and assign the result to a variable. This will be our random number generator.π Solution
rng = np.random.default_rng()
Task 1.3: Generate Normally Distributed Random Numbers
Generate a set of random numbers using the random number generator. Let's generate 1000 numbers from a normal distribution with mean 0 and standard deviation 1.
π Hint
Use the
normal
method of the random number generator. Pass in the mean, standard deviation, and number of samples as arguments.π Solution
numbers = rng.normal(0, 1, 1000)
Task 1.4: Create a Histogram of the Normal Random Numbers
Create a histogram of the generated numbers using the
matplotlib.pyplot.hist
function.π Hint
Call the
hist
function from thepyplot
module. Pass in the generated numbers as the argument.π Solution
plt.hist(numbers, bins=20) plt.show()
Task 1.5: Generate Uniformly Sampled Random Numbers
Now let's generate 1000 numbers from a uniform distribution between 0 and 1.
π Hint
Use the
uniform
method of the random number generator. Pass in the lower bound, upper bound, and number of samples as arguments.π Solution
numbers_uniform = rng.uniform(0, 1, 1000)
Task 1.6: Create a Histogram of the Uniform Random Sample
Create a histogram of the generated uniformly sampled numbers using the
matplotlib.pyplot.hist
function.π Hint
Call the
hist
function from thepyplot
module. Pass in the generated numbers as the argument.π Solution
plt.hist(numbers_uniform, bins=20) plt.show()
-
Challenge
Exploring Histogram Parameters
Exploring Histogram Parameters
To review the concepts covered in this step, please refer to the Exploring Variable Distributions with Plots module of the Statistical Analysis with Matplotlib course.
Exploring the parameters of a histogram is important because it allows us to customize the visualization to better understand the data. It can help us reveal patterns in the data that might not be apparent with the default settings.
In this step, we will practice customizing the bin size and visual elements of a histogram. The goal is to understand how these parameters affect the visualization and what they can tell us about the data. We will use the
numpy.random.default_rng
object to generate the data and thematplotlib.pyplot.hist
function to generate the plots. We'll explore its parameters likebins
,color
,edgecolor
, andrwidth
parameters.
Task 2.1: Generate Random Data
Use the
numpy.random.default_rng
object to generate a set of random data. This data will be used to create the histogram.π Hint
Use the
rng.normal
method to generate a set of normally distributed random data. You can specify the size of the data set as an argument to this method.π Solution
import numpy as np rng = np.random.default_rng() data = rng.normal(size=1000)
Task 2.2: Create Basic Histogram
Use the
matplotlib.pyplot.hist
function to create a basic histogram of the data.π Hint
Use the
plt.hist
function and pass in the data as an argument. Then, useplt.show
to display the histogram.π Solution
import matplotlib.pyplot as plt plt.hist(data) plt.show()
Task 2.3: Customize Bin Size
Customize the bin size of the histogram to better understand the distribution of the data. To do this, recreate the histogram from
Task 2.2
with a custom value for thebins
parameter.π Hint
Use the
bins
parameter of theplt.hist
function to specify the number of bins in the histogram.π Solution
plt.hist(data, bins=50) plt.show()
Task 2.4: Customize Visual Elements
Now lets customize the color, edge color, and relative width of the bars in the histogram to improve its readability.
π Hint
Use the
color
,edgecolor
, andrwidth
parameters of theplt.hist
function to customize the visual elements of the histogram.π Solution
plt.hist(data, bins=50, color='red', edgecolor='black', rwidth=0.5) plt.show()
-
Challenge
Creating and Customizing Boxplots
Creating and Customizing Boxplots
To review the concepts covered in this step, please refer to the Using Different Chart Types for Distributions module of the Statistical Analysis with Matplotlib course.
Creating and customizing boxplots is important because they provide a summary of the data's distribution. They can help us identify outliers, skewness, and other characteristics of the data.
In this step, we will practice creating and customizing boxplots using Matplotlib. The goal is to understand how to interpret a boxplot and how to customize it to better represent the data. We will use the
matplotlib.pyplot.boxplot
function and its parameters likenotch
,whis
,showmeans
, etc. We will use thenumpy.random.default_rng
object to generate the data for the plots.
Task 3.1: Importing Required Libraries
Import the required libraries for this lab. We will need
matplotlib.pyplot
for creating the boxplots andnumpy
for generating the random data.π Hint
Use the
import
keyword to import libraries. For example, to importmatplotlib.pyplot
, you would writeimport matplotlib.pyplot as plt
.π Solution
import matplotlib.pyplot as plt import numpy as np
Task 3.2: Generating Random Data
Generate a random dataset using the
numpy.random.default_rng
object. Create an array of 100 random numbers.π Hint
First, create a
default_rng
object. Then, use theuniform
method of this object to generate the random numbers. For example,rng = np.random.default_rng()
anddata = rng.uniform(low=0, high=1, size=100)
.π Solution
rng = np.random.default_rng() data = rng.uniform(low=0, high=1, size=100)
Task 3.3: Creating a Basic Boxplot
Create a basic boxplot of the data using the
matplotlib.pyplot.boxplot
function.π Hint
Use the
boxplot
function ofmatplotlib.pyplot
to create the boxplot. For example,plt.boxplot(data)
.π Solution
plt.boxplot(data) plt.show()
Task 3.4: Customizing the Boxplot
Customize the boxplot by adding a notch, changing the whisker length, and showing the means. Use the
notch
,whis
, andshowmeans
parameters of theboxplot
function.π Hint
Use the
notch
,whis
, andshowmeans
parameters of theboxplot
function. For example,plt.boxplot(data, notch=True, whis=1.5, showmeans=True)
.π Solution
plt.boxplot(data, notch=True, whis=1.5, showmeans=True) plt.show()
-
Challenge
Creating Violin Plots and Subplot Mosaics
Creating Violin Plots and Subplot Mosaics
To review the concepts covered in this step, please refer to the Using Different Chart Types for Distributions module of the Statistical Analysis with Matplotlib course.
Creating violin plots and subplot mosaics is important because they provide more detailed visualizations of the data. Violin plots combine the benefits of histograms and boxplots, while subplot mosaics allow us to compare multiple plots side by side.
In this step, we will practice creating violin plots and subplot mosaics using Matplotlib. The goal is to understand how these visualizations can provide more insights into the data. We will use the
plt.violinplot
function to create some violin plots and thefigure.subplot_mosaic
function to create subplots with keys. We'll display two different datasets side by side with data generated randomly using thenumpy.random.default_rng
object.
Task 4.1: Importing Necessary Libraries
Import the necessary libraries for creating violin plots and subplot mosaics.
π Hint
You will need to import
matplotlib.pyplot
asplt
andnumpy
asnp
.π Solution
import matplotlib.pyplot as plt import numpy as np
Task 4.2: Generating Random Data
Generate two sets of normally distributed random data using the
numpy.random.default_rng
object, each with 500 samples. Choose different mean and standard deviations for each dataset.π Hint
Use the
numpy.random.default_rng
object to generate two sets of random data. You can use thenormal
method of thedefault_rng
object to generate normally distributed random data.π Solution
rng = np.random.default_rng() data1 = rng.normal(loc=1, scale=0.5, size=500) data2 = rng.normal(loc=2, scale=2, size=500)
Task 4.3: Creating Violin Plots
Create violin plots for the two sets of data using the
plt.violinplot
function.π Hint
Use the
plt.violinplot
function of the matplotlib library to create the violin plots. Pass the two data sets in a list to theviolinplot
method.π Solution
plt.violinplot([data1, data2]) plt.show()
Task 4.4: Constructing Subplot Mosaics for Violin Plots and Histograms
Create a subplot mosaic that arranges violin plots and histograms for two separate datasets in a specific grid pattern. The grid should be organized as follows:
- The top row displays the first dataset: a violin plot followed by a histogram.
- The bottom row shows the second dataset in a similar fashion: a violin plot and then a histogram.
The desired layout is:
[[d1violin, d1hist], [d2violin, d2hist]]
π Hint
Begin by initializing a figure with
plt.figure
. Utilize thesubplot_mosaic
method on this figure, passing in a nested list that represents your grid layout, with each inner list denoting a row.π Solution
fig = plt.figure() axes = fig.subplot_mosaic([ ['d1violin', 'd1hist'], ['d2violin', 'd2hist'] ]) axes['d1violin'].violinplot(data1) axes['d1hist'].hist(data1) axes['d2violin'].violinplot(data2) axes['d2hist'].hist(data2) plt.show()
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the authorβs guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.