Hamburger Icon
  • Labs icon Lab
  • Data
Labs

Statistical Analysis with Matplotlib Hands-on Practice

In this lab, you'll use Jupyter Notebook to learn data analysis and visualization. Starting with generating random numbers using NumPy, you'll create histograms and boxplots with Matplotlib. The tasks include initializing random generators, creating various data distributions, and visualizing them through histograms, boxplots, violin plots, and subplot mosaics, enhancing your statistical analysis skills.

Labs

Path Info

Level
Clock icon Beginner
Duration
Clock icon 27m
Published
Clock icon Dec 12, 2023

Contact sales

By filling out this form and clicking submit, you acknowledge ourΒ privacy policy.

Table of Contents

  1. Challenge

    Generating Random Numbers and Creating Histograms

    Jupyter Guide

    To get started, open the file on the right entitled "Step 1...". You'll complete each task for Step 1 in that Jupyter Notebook file. Remember, you must run the cells (Ctrl/Cmd(⌘) + Enter) for each task before moving onto the next task in the Jupyter Notebook. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.


    Generating Random Numbers and Creating Histograms

    To review the concepts covered in this step, please refer to the Exploring Variable Distributions with Plots module of the Statistical Analysis with Matplotlib course.

    Understanding how to generate random numbers and create histograms is important because these skills form the foundation of data analysis and visualization. They allow us to simulate data and understand the distribution of data respectively.

    In this step, we will practice generating random numbers using NumPy and creating histograms using Matplotlib. The goal is to familiarize ourselves with the process of generating data and visualizing its distribution. We will use the numpy.random.default_rng object to generate the data and the matplotlib.pyplot.hist function to create the histogram. We'll look at a few different distributions.


    Task 1.1: Importing Required Libraries

    Import the required libraries for this lab. We will need NumPy for generating random numbers and Matplotlib for creating histograms.

    πŸ” Hint

    Use the import keyword to import NumPy and Matplotlib. Remember to use the pyplot module from Matplotlib and use aliases for simplicity.

    πŸ”‘ Solution
    import numpy as np
    import matplotlib.pyplot as plt
    

    Task 1.2: Initialize Random Number Generator

    Initialize the random number generator using the numpy.random.default_rng function.

    πŸ” Hint

    Call the numpy.random.default_rng function and assign the result to a variable. This will be our random number generator.

    πŸ”‘ Solution
    rng = np.random.default_rng()
    

    Task 1.3: Generate Normally Distributed Random Numbers

    Generate a set of random numbers using the random number generator. Let's generate 1000 numbers from a normal distribution with mean 0 and standard deviation 1.

    πŸ” Hint

    Use the normal method of the random number generator. Pass in the mean, standard deviation, and number of samples as arguments.

    πŸ”‘ Solution
    numbers = rng.normal(0, 1, 1000)
    

    Task 1.4: Create a Histogram of the Normal Random Numbers

    Create a histogram of the generated numbers using the matplotlib.pyplot.hist function.

    πŸ” Hint

    Call the hist function from the pyplot module. Pass in the generated numbers as the argument.

    πŸ”‘ Solution
    plt.hist(numbers, bins=20)
    plt.show()
    

    Task 1.5: Generate Uniformly Sampled Random Numbers

    Now let's generate 1000 numbers from a uniform distribution between 0 and 1.

    πŸ” Hint

    Use the uniform method of the random number generator. Pass in the lower bound, upper bound, and number of samples as arguments.

    πŸ”‘ Solution
    numbers_uniform = rng.uniform(0, 1, 1000)
    

    Task 1.6: Create a Histogram of the Uniform Random Sample

    Create a histogram of the generated uniformly sampled numbers using the matplotlib.pyplot.hist function.

    πŸ” Hint

    Call the hist function from the pyplot module. Pass in the generated numbers as the argument.

    πŸ”‘ Solution
    plt.hist(numbers_uniform, bins=20)
    plt.show()
    
  2. Challenge

    Exploring Histogram Parameters

    Exploring Histogram Parameters

    To review the concepts covered in this step, please refer to the Exploring Variable Distributions with Plots module of the Statistical Analysis with Matplotlib course.

    Exploring the parameters of a histogram is important because it allows us to customize the visualization to better understand the data. It can help us reveal patterns in the data that might not be apparent with the default settings.

    In this step, we will practice customizing the bin size and visual elements of a histogram. The goal is to understand how these parameters affect the visualization and what they can tell us about the data. We will use the numpy.random.default_rng object to generate the data and the matplotlib.pyplot.hist function to generate the plots. We'll explore its parameters like bins, color, edgecolor, and rwidth parameters.


    Task 2.1: Generate Random Data

    Use the numpy.random.default_rng object to generate a set of random data. This data will be used to create the histogram.

    πŸ” Hint

    Use the rng.normal method to generate a set of normally distributed random data. You can specify the size of the data set as an argument to this method.

    πŸ”‘ Solution
    import numpy as np
    
    rng = np.random.default_rng()
    
    data = rng.normal(size=1000)
    

    Task 2.2: Create Basic Histogram

    Use the matplotlib.pyplot.hist function to create a basic histogram of the data.

    πŸ” Hint

    Use the plt.hist function and pass in the data as an argument. Then, use plt.show to display the histogram.

    πŸ”‘ Solution
    import matplotlib.pyplot as plt
    
    plt.hist(data)
    plt.show()
    

    Task 2.3: Customize Bin Size

    Customize the bin size of the histogram to better understand the distribution of the data. To do this, recreate the histogram from Task 2.2 with a custom value for the bins parameter.

    πŸ” Hint

    Use the bins parameter of the plt.hist function to specify the number of bins in the histogram.

    πŸ”‘ Solution
    plt.hist(data, bins=50)
    plt.show()
    

    Task 2.4: Customize Visual Elements

    Now lets customize the color, edge color, and relative width of the bars in the histogram to improve its readability.

    πŸ” Hint

    Use the color, edgecolor, and rwidth parameters of the plt.hist function to customize the visual elements of the histogram.

    πŸ”‘ Solution
    plt.hist(data, bins=50, color='red', edgecolor='black', rwidth=0.5)
    plt.show()
    
  3. Challenge

    Creating and Customizing Boxplots

    Creating and Customizing Boxplots

    To review the concepts covered in this step, please refer to the Using Different Chart Types for Distributions module of the Statistical Analysis with Matplotlib course.

    Creating and customizing boxplots is important because they provide a summary of the data's distribution. They can help us identify outliers, skewness, and other characteristics of the data.

    In this step, we will practice creating and customizing boxplots using Matplotlib. The goal is to understand how to interpret a boxplot and how to customize it to better represent the data. We will use the matplotlib.pyplot.boxplot function and its parameters like notch, whis, showmeans, etc. We will use the numpy.random.default_rng object to generate the data for the plots.


    Task 3.1: Importing Required Libraries

    Import the required libraries for this lab. We will need matplotlib.pyplot for creating the boxplots and numpy for generating the random data.

    πŸ” Hint

    Use the import keyword to import libraries. For example, to import matplotlib.pyplot, you would write import matplotlib.pyplot as plt.

    πŸ”‘ Solution
    import matplotlib.pyplot as plt
    import numpy as np
    

    Task 3.2: Generating Random Data

    Generate a random dataset using the numpy.random.default_rng object. Create an array of 100 random numbers.

    πŸ” Hint

    First, create a default_rng object. Then, use the uniform method of this object to generate the random numbers. For example, rng = np.random.default_rng() and data = rng.uniform(low=0, high=1, size=100).

    πŸ”‘ Solution
    rng = np.random.default_rng()
    data = rng.uniform(low=0, high=1, size=100)
    

    Task 3.3: Creating a Basic Boxplot

    Create a basic boxplot of the data using the matplotlib.pyplot.boxplot function.

    πŸ” Hint

    Use the boxplot function of matplotlib.pyplot to create the boxplot. For example, plt.boxplot(data).

    πŸ”‘ Solution
    plt.boxplot(data)
    plt.show()
    

    Task 3.4: Customizing the Boxplot

    Customize the boxplot by adding a notch, changing the whisker length, and showing the means. Use the notch, whis, and showmeans parameters of the boxplot function.

    πŸ” Hint

    Use the notch, whis, and showmeans parameters of the boxplot function. For example, plt.boxplot(data, notch=True, whis=1.5, showmeans=True).

    πŸ”‘ Solution
    plt.boxplot(data, notch=True, whis=1.5, showmeans=True)
    plt.show()
    
  4. Challenge

    Creating Violin Plots and Subplot Mosaics

    Creating Violin Plots and Subplot Mosaics

    To review the concepts covered in this step, please refer to the Using Different Chart Types for Distributions module of the Statistical Analysis with Matplotlib course.

    Creating violin plots and subplot mosaics is important because they provide more detailed visualizations of the data. Violin plots combine the benefits of histograms and boxplots, while subplot mosaics allow us to compare multiple plots side by side.

    In this step, we will practice creating violin plots and subplot mosaics using Matplotlib. The goal is to understand how these visualizations can provide more insights into the data. We will use the plt.violinplot function to create some violin plots and the figure.subplot_mosaic function to create subplots with keys. We'll display two different datasets side by side with data generated randomly using the numpy.random.default_rng object.


    Task 4.1: Importing Necessary Libraries

    Import the necessary libraries for creating violin plots and subplot mosaics.

    πŸ” Hint

    You will need to import matplotlib.pyplot as plt and numpy as np.

    πŸ”‘ Solution
    import matplotlib.pyplot as plt
    import numpy as np
    

    Task 4.2: Generating Random Data

    Generate two sets of normally distributed random data using the numpy.random.default_rng object, each with 500 samples. Choose different mean and standard deviations for each dataset.

    πŸ” Hint

    Use the numpy.random.default_rng object to generate two sets of random data. You can use the normal method of the default_rng object to generate normally distributed random data.

    πŸ”‘ Solution
    rng = np.random.default_rng()
    data1 = rng.normal(loc=1, scale=0.5, size=500)
    data2 = rng.normal(loc=2, scale=2, size=500)
    

    Task 4.3: Creating Violin Plots

    Create violin plots for the two sets of data using the plt.violinplot function.

    πŸ” Hint

    Use the plt.violinplot function of the matplotlib library to create the violin plots. Pass the two data sets in a list to the violinplot method.

    πŸ”‘ Solution
    plt.violinplot([data1, data2])
    plt.show()
    

    Task 4.4: Constructing Subplot Mosaics for Violin Plots and Histograms

    Create a subplot mosaic that arranges violin plots and histograms for two separate datasets in a specific grid pattern. The grid should be organized as follows:

    • The top row displays the first dataset: a violin plot followed by a histogram.
    • The bottom row shows the second dataset in a similar fashion: a violin plot and then a histogram.

    The desired layout is:

    [[d1violin, d1hist],
    [d2violin, d2hist]]
    
    πŸ” Hint

    Begin by initializing a figure with plt.figure. Utilize the subplot_mosaic method on this figure, passing in a nested list that represents your grid layout, with each inner list denoting a row.

    πŸ”‘ Solution
    fig = plt.figure()
    axes = fig.subplot_mosaic([
        ['d1violin', 'd1hist'],
        ['d2violin', 'd2hist']
    ])
    axes['d1violin'].violinplot(data1)
    axes['d1hist'].hist(data1)
    axes['d2violin'].violinplot(data2)
    axes['d2hist'].hist(data2)
    plt.show()
    

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.