Libraries: If you want this lab, consider one of these libraries.
Data

Normalize Data to Make It Appropriate for an Analysis with Pandas Hands-on Practice

In this lab, you'll master data normalization with Pandas and Sklearn in Python. You'll practice standard scaling, Min-Max scaling, and l1, l2, and max normalizations. By creating datasets, applying various techniques, and visualizing the outcomes, you'll gain a deep understanding of data preprocessing methods and their effects on data distribution.

Get started Contact sales

Lab Info

Level

Beginner

Last updated

Jan 05, 2026

Duration

32m

Challenge

Introduction to Normalization
Jupyter Guide

To get started, open the file on the right entitled "Step 1...". You'll complete each task for Step 1 in that Jupyter Notebook file. Remember, you must run the cells (ctrl/cmd(⌘) + Enter) for each task before moving onto the next task in the Jupyter Notebook. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.

Introduction to Normalization

To review the concepts covered in this step, please refer to the Some Simple Normalization Techniques module of the Normalize Data to Make It Appropriate for an Analysis with Pandas course.

Data normalization is important because it makes features of a model have equal weight and makes data more robust for analysis. Here, we will practice normalizing features in a dataset using Pandas and the standard scaling technique.

This technique is a straightforward way to bring all attributes to the same scale by subtracting the mean and dividing by the standard deviation. You will use sklearn and the pandas library to normalize a randomly sampled dataset. We'll use the StandardScalartransform from sklearn to normalize the data such that its distribution will have a mean of 0 and a standard deviation of 1. Observe the effects by plotting the distribution before and after normalization.

Task 1.1: Importing Necessary Libraries

Before we can start working with data, we need to import the necessary libraries. Import pandas, numpy, matplotlib, and the StandardScaler from sklearn.preprocessing.

🔍 Hint

Use the import keyword to import a library. For example, import pandas as pd imports the pandas library and assigns it to the alias pd.
🔑 Solution

import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler
Task 1.2: Creating a Random Dataset

Create a pandas DataFrame with 100 rows and 5 columns. The columns should be filled with random numbers from a normal distribution centered at mu = 2 and with standard deviation sigma = 5. Display the first few rows of the dataframe.

🔍 Hint

Use np.random.normal(mu, sigma, size=(rows, cols)) to generate the array. Use pd.DataFrame(array) to convert the array to a DataFrame. Use df.head() to display the first few rows of the data.
🔑 Solution

array = np.random.normal(2, 5, size=(100, 5)) df = pd.DataFrame(array) df.head()
Task 1.3: Plotting the Distribution Before Normalization

Plot the distribution of the values in the first column of the DataFrame before normalization. Use DataFrame.plot(kind='density') to create the plot.

🔍 Hint

Using DataFrame.plot(kind='density') will plot the distribution as a smoothed line plot.
🔑 Solution

df.plot(kind='density')
Task 1.4: Normalizing the Data

Normalize the data in the DataFrame using the StandardScaler. Display the head of the transformed data.

🔍 Hint

Use StandardScaler() to create a scaler. Use scaler.fit(df) to fit the scaler to the data. Use scaler.transform(df) to transform the data.
🔑 Solution

scaler = StandardScaler() scaler.fit(df) df_normalized = pd.DataFrame(scaler.transform(df)) df_normalized.head()
Task 1.5: Plotting the Distribution After Normalization

Plot the normalized data against the original data and compare the normalized vs the old distributions.
🔍 Hint

Use DataFrame.plot(kind="density") to plot the distributions. To overlay the original data, call plot on both original and normalized dataframes with the axis keyword argument set. For example:

fig, ax = fig.subplots() DataFrame.plot(ax=ax, kind="density")
🔑 Solution

fig, ax = plt.subplots() df.plot(ax=ax, kind='density') df_normalized.plot(ax=ax, kind='density')
Challenge

Applying Simple Scaling and Min-Max Scaling Techniques
Applying Min-max Scaling Technique

To review the concepts covered in this step, please refer to the Some Simple Normalization Techniques module of the Normalize Data to Make It Appropriate for an Analysis with Pandas course.

Understanding different normalization techniques is crucial because it allows you to choose the correct method based on the characteristics of your data. In this step, you will apply min-max scaling.

Let's take your data normalization skills to the next level by practicing another scaling technique: Min-Max Scaling. Min-Max scaling transforms data to fit within the range of 0 and 1. Use sklearn to try out this technique on a randomly sampled DataFrame, and observe their effects on data distribution by plotting the sample dataset before and after normalization.

Task 2.1: Import Required Libraries

Before we can start working with data, we need to import the necessary libraries. Import pandas, numpy, matplotlib, and the MinMaxScaler from sklearn.preprocessing.

🔍 Hint

Use the import keyword to import a library. For example, import pandas as pd imports the pandas library and assigns it to the alias pd.
🔑 Solution

import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.preprocessing import MinMaxScaler
Task 2.2: Creating a Random Dataset

Create a pandas DataFrame with 100 rows and 5 columns. The columns should be filled with random numbers from a uniform distribution in the range [8, 10]. Display the first few rows of the DataFrame.

🔍 Hint

Use np.random.uniform(low, high, size=(rows, cols)) to generate the array. Use pd.DataFrame(array) to convert the array to a DataFrame. Use df.head() to display the first few rows of the data.
🔑 Solution

array = np.random.uniform(8, 10, size=(100, 5)) df = pd.DataFrame(array) df.head()
Task 2.3: Plotting the Distribution Before Normalization

Plot the distribution of the values in the first column of the DataFrame before normalization. Use DataFrame.plot(kind='density') to create the plot.

🔍 Hint

Using DataFrame.plot(kind='density') will plot the distribution as a smoothed line plot.
🔑 Solution

df.plot(kind='density')
Task 2.4: Apply Min-Max Scaling

Normalize the data in the DataFrame using the MinMaxScaler. Display the head of the transformed data.

🔍 Hint

Use MinMaxScaler() to create a scaler. Use scaler.fit(df) to fit the scaler to the data. Use scaler.transform(df) to transform the data.
🔑 Solution

scaler = MinMaxScaler() scaler.fit(df) df_normalized = pd.DataFrame(scaler.transform(df)) df_normalized.head()
Task 2.5: Plotting the Distribution After Normalization

Plot the normalized data against the original data and compare the normalized vs the old distributions.
🔍 Hint

Use DataFrame.plot(kind="density") to plot the distributions. To overlay the original data, call plot on both original and normalized dataframes with the axis keyword argument set. For example:

fig, ax = fig.subplots() DataFrame.plot(ax=ax, kind="density")
🔑 Solution

fig, ax = plt.subplots() df.plot(ax=ax, kind='density') df_normalized.plot(ax=ax, kind='density')
Challenge

Experiment with Gaussian Normalization
Experiment with Different Normalizations

To review the concepts covered in this step, please refer to the Different Types of Normalization module of the Normalize Data to Make It Appropriate for an Analysis with Pandas course.

Exploring different normalization techniques is essential to understand how each method affects your data. In this step, you will experiment with l1, l2, and max normalizations. These techniques are helpful in various data preprocessing scenarios. We'll use Sklearn to apply these normalization techniques to a randomly generated dataset and observe the differences in data distribution before and after normalization.

Task 3.1: Import Required Libraries

Similar to the previous tasks, import the necessary libraries. This time, include the Normalizer from sklearn.preprocessing.

🔍 Hint

Remember to import pandas, numpy, matplotlib, and now the Normalizer.
🔑 Solution

import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.preprocessing import Normalizer
Task 3.2: Creating a Random Dataset

Create another DataFrame with 100 rows and 5 columns, filled with random numbers from a uniform distribution in the range [5, 15]. Display the first few rows of this new dataframe.

🔍 Hint

You can generate the data using np.random.uniform() and convert it into a DataFrame using pd.DataFrame().
🔑 Solution

array = np.random.uniform(5, 15, size=(100, 5)) df = pd.DataFrame(array) df.head()
Task 3.3: Plotting the Distribution Before Normalization

Plot the distribution of the values in the DataFrame before applying any normalization.

🔍 Hint

Use DataFrame.plot(kind='density') for a density plot of the DataFrame.
🔑 Solution

df.plot(kind='density')
Task 3.4: Apply l1 and l2 Normalization

Normalize the data using l1 and l2 normalizations. For each, display the head of the transformed data.

🔍 Hint

Create two normalizers, one with the norm='l1' parameter and the other with norm='l2'. Use fit_transform to apply the normalization.
🔑 Solution

normalizer_l1 = Normalizer(norm='l1') df_normalized_l1 = pd.DataFrame(normalizer_l1.fit_transform(df)) normalizer_l2 = Normalizer(norm='l2') df_normalized_l2 = pd.DataFrame(normalizer_l2.fit_transform(df)) print(df_normalized_l1.head()) print(df_normalized_l2.head())
Task 3.5: Apply Max Normalization

Now, apply max normalization to the data and display the first few rows of the transformed data.

🔍 Hint

Max normalization uses the norm='max' parameter in the Normalizer.
🔑 Solution

normalizer_max = Normalizer(norm='max') df_normalized_max = pd.DataFrame(normalizer_max.fit_transform(df)) df_normalized_max.head()
Task 3.6: Plotting the Distributions After Normalization

Plot the distributions of the normalized data (l1, l2, and max) to compare how each normalization technique has transformed the data.
🔍 Hint

Overlay the plots of the normalized dataframes. Use the ax keyword argument for the DataFrame.plot method to plot all of the normalized distributions on the same plt. For example:

fig, ax = plt.subplots() df.plot(ax=ax, kind='density')
🔑 Solution

fig, ax = plt.subplots() df_normalized_l1.plot(ax=ax, kind='density') df_normalized_l2.plot(ax=ax, kind='density') df_normalized_max.plot(ax=ax, kind='density')

About the author

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Normalize Data to Make It Appropriate for an Analysis with Pandas Hands-on Practice

Lab Info

Table of Contents

Introduction to Normalization

Jupyter Guide

Introduction to Normalization

Task 1.1: Importing Necessary Libraries

Task 1.2: Creating a Random Dataset

Task 1.3: Plotting the Distribution Before Normalization

Task 1.4: Normalizing the Data

Task 1.5: Plotting the Distribution After Normalization

Applying Simple Scaling and Min-Max Scaling Techniques

Applying Min-max Scaling Technique

Task 2.1: Import Required Libraries

Task 2.2: Creating a Random Dataset

Task 2.3: Plotting the Distribution Before Normalization

Task 2.4: Apply Min-Max Scaling

Task 2.5: Plotting the Distribution After Normalization

Experiment with Gaussian Normalization

Experiment with Different Normalizations

Task 3.1: Import Required Libraries

Task 3.2: Creating a Random Dataset

Task 3.3: Plotting the Distribution Before Normalization

Task 3.4: Apply l1 and l2 Normalization

Task 3.5: Apply Max Normalization

Task 3.6: Plotting the Distributions After Normalization

About the author

Real skill practice before real-world application

Learn by doing

Follow your guide

Turn time into mastery

Get started with Pluralsight