- Lab
- Data

Normalize Data to Make It Appropriate for an Analysis with Pandas Hands-on Practice
In this lab, you'll master data normalization with Pandas and Sklearn in Python. You'll practice standard scaling, Min-Max scaling, and l1, l2, and max normalizations. By creating datasets, applying various techniques, and visualizing the outcomes, you'll gain a deep understanding of data preprocessing methods and their effects on data distribution.

Path Info
Table of Contents
-
Challenge
Introduction to Normalization
Jupyter Guide
To get started, open the file on the right entitled "Step 1...". You'll complete each task for Step 1 in that Jupyter Notebook file. Remember, you must run the cells
(ctrl/cmd(β) + Enter)
for each task before moving onto the next task in the Jupyter Notebook. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.
Introduction to Normalization
To review the concepts covered in this step, please refer to the Some Simple Normalization Techniques module of the Normalize Data to Make It Appropriate for an Analysis with Pandas course.
Data normalization is important because it makes features of a model have equal weight and makes data more robust for analysis. Here, we will practice normalizing features in a dataset using Pandas and the standard scaling technique.
This technique is a straightforward way to bring all attributes to the same scale by subtracting the mean and dividing by the standard deviation. You will use
sklearn
and thepandas
library to normalize a randomly sampled dataset. We'll use theStandardScalar
transform fromsklearn
to normalize the data such that its distribution will have a mean of0
and a standard deviation of1
. Observe the effects by plotting the distribution before and after normalization.
Task 1.1: Importing Necessary Libraries
Before we can start working with data, we need to import the necessary libraries. Import
pandas
,numpy
,matplotlib
, and theStandardScaler
fromsklearn.preprocessing
.π Hint
Use the
import
keyword to import a library. For example,import pandas as pd
imports the pandas library and assigns it to the aliaspd
.π Solution
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler
Task 1.2: Creating a Random Dataset
Create a pandas DataFrame with 100 rows and 5 columns. The columns should be filled with random numbers from a normal distribution centered at
mu = 2
and with standard deviationsigma = 5
. Display the first few rows of the dataframe.π Hint
Use
np.random.normal(mu, sigma, size=(rows, cols))
to generate the array. Usepd.DataFrame(array)
to convert the array to a DataFrame. Use df.head() to display the first few rows of the data.π Solution
array = np.random.normal(2, 5, size=(100, 5)) df = pd.DataFrame(array) df.head()
Task 1.3: Plotting the Distribution Before Normalization
Plot the distribution of the values in the first column of the DataFrame before normalization. Use
DataFrame.plot(kind='density')
to create the plot.π Hint
Using
DataFrame.plot(kind='density')
will plot the distribution as a smoothed line plot.π Solution
df.plot(kind='density')
Task 1.4: Normalizing the Data
Normalize the data in the DataFrame using the StandardScaler. Display the head of the transformed data.
π Hint
Use
StandardScaler()
to create a scaler. Usescaler.fit(df)
to fit the scaler to the data. Usescaler.transform(df)
to transform the data.π Solution
scaler = StandardScaler() scaler.fit(df) df_normalized = pd.DataFrame(scaler.transform(df)) df_normalized.head()
Task 1.5: Plotting the Distribution After Normalization
Plot the normalized data against the original data and compare the normalized vs the old distributions.
π Hint
Use
DataFrame.plot(kind="density")
to plot the distributions. To overlay the original data, call plot on both original and normalized dataframes with the axis keyword argument set. For example:fig, ax = fig.subplots() DataFrame.plot(ax=ax, kind="density")
π Solution
fig, ax = plt.subplots() df.plot(ax=ax, kind='density') df_normalized.plot(ax=ax, kind='density')
-
Challenge
Applying Simple Scaling and Min-Max Scaling Techniques
Applying Min-max Scaling Technique
To review the concepts covered in this step, please refer to the Some Simple Normalization Techniques module of the Normalize Data to Make It Appropriate for an Analysis with Pandas course.
Understanding different normalization techniques is crucial because it allows you to choose the correct method based on the characteristics of your data. In this step, you will apply min-max scaling.
Let's take your data normalization skills to the next level by practicing another scaling technique:
Min-Max Scaling
. Min-Max scaling transforms data to fit within the range of0
and1
. Usesklearn
to try out this technique on a randomly sampled DataFrame, and observe their effects on data distribution by plotting the sample dataset before and after normalization.
Task 2.1: Import Required Libraries
Before we can start working with data, we need to import the necessary libraries. Import
pandas
,numpy
,matplotlib
, and theMinMaxScaler
fromsklearn.preprocessing
.π Hint
Use the
import
keyword to import a library. For example,import pandas as pd
imports the pandas library and assigns it to the aliaspd
.π Solution
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.preprocessing import MinMaxScaler
Task 2.2: Creating a Random Dataset
Create a pandas DataFrame with 100 rows and 5 columns. The columns should be filled with random numbers from a uniform distribution in the range
[8, 10]
. Display the first few rows of the DataFrame.π Hint
Use
np.random.uniform(low, high, size=(rows, cols))
to generate the array. Usepd.DataFrame(array)
to convert the array to a DataFrame. Use df.head() to display the first few rows of the data.π Solution
array = np.random.uniform(8, 10, size=(100, 5)) df = pd.DataFrame(array) df.head()
Task 2.3: Plotting the Distribution Before Normalization
Plot the distribution of the values in the first column of the DataFrame before normalization. Use
DataFrame.plot(kind='density')
to create the plot.π Hint
Using
DataFrame.plot(kind='density')
will plot the distribution as a smoothed line plot.π Solution
df.plot(kind='density')
Task 2.4: Apply Min-Max Scaling
Normalize the data in the DataFrame using the
MinMaxScaler
. Display the head of the transformed data.π Hint
Use
MinMaxScaler()
to create a scaler. Usescaler.fit(df)
to fit the scaler to the data. Usescaler.transform(df)
to transform the data.π Solution
scaler = MinMaxScaler() scaler.fit(df) df_normalized = pd.DataFrame(scaler.transform(df)) df_normalized.head()
Task 2.5: Plotting the Distribution After Normalization
Plot the normalized data against the original data and compare the normalized vs the old distributions.
π Hint
Use
DataFrame.plot(kind="density")
to plot the distributions. To overlay the original data, call plot on both original and normalized dataframes with the axis keyword argument set. For example:fig, ax = fig.subplots() DataFrame.plot(ax=ax, kind="density")
π Solution
fig, ax = plt.subplots() df.plot(ax=ax, kind='density') df_normalized.plot(ax=ax, kind='density')
-
Challenge
Experiment with Gaussian Normalization
Experiment with Different Normalizations
To review the concepts covered in this step, please refer to the Different Types of Normalization module of the Normalize Data to Make It Appropriate for an Analysis with Pandas course.
Exploring different normalization techniques is essential to understand how each method affects your data. In this step, you will experiment with
l1
,l2
, andmax
normalizations. These techniques are helpful in various data preprocessing scenarios. We'll use Sklearn to apply these normalization techniques to a randomly generated dataset and observe the differences in data distribution before and after normalization.
Task 3.1: Import Required Libraries
Similar to the previous tasks, import the necessary libraries. This time, include the
Normalizer
fromsklearn.preprocessing
.π Hint
Remember to import pandas, numpy, matplotlib, and now the Normalizer.
π Solution
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.preprocessing import Normalizer
Task 3.2: Creating a Random Dataset
Create another DataFrame with 100 rows and 5 columns, filled with random numbers from a uniform distribution in the range
[5, 15]
. Display the first few rows of this new dataframe.π Hint
You can generate the data using
np.random.uniform()
and convert it into a DataFrame usingpd.DataFrame()
.π Solution
array = np.random.uniform(5, 15, size=(100, 5)) df = pd.DataFrame(array) df.head()
Task 3.3: Plotting the Distribution Before Normalization
Plot the distribution of the values in the DataFrame before applying any normalization.
π Hint
Use
DataFrame.plot(kind='density')
for a density plot of the DataFrame.π Solution
df.plot(kind='density')
Task 3.4: Apply l1 and l2 Normalization
Normalize the data using
l1
andl2
normalizations. For each, display the head of the transformed data.π Hint
Create two normalizers, one with the
norm='l1'
parameter and the other withnorm='l2'
. Usefit_transform
to apply the normalization.π Solution
normalizer_l1 = Normalizer(norm='l1') df_normalized_l1 = pd.DataFrame(normalizer_l1.fit_transform(df)) normalizer_l2 = Normalizer(norm='l2') df_normalized_l2 = pd.DataFrame(normalizer_l2.fit_transform(df)) print(df_normalized_l1.head()) print(df_normalized_l2.head())
Task 3.5: Apply Max Normalization
Now, apply max normalization to the data and display the first few rows of the transformed data.
π Hint
Max normalization uses the
norm='max'
parameter in the Normalizer.π Solution
normalizer_max = Normalizer(norm='max') df_normalized_max = pd.DataFrame(normalizer_max.fit_transform(df)) df_normalized_max.head()
Task 3.6: Plotting the Distributions After Normalization
Plot the distributions of the normalized data (l1, l2, and max) to compare how each normalization technique has transformed the data.
π Hint
Overlay the plots of the normalized dataframes. Use the
ax
keyword argument for theDataFrame.plot
method to plot all of the normalized distributions on the same plt. For example:fig, ax = plt.subplots() df.plot(ax=ax, kind='density')
π Solution
fig, ax = plt.subplots() df_normalized_l1.plot(ax=ax, kind='density') df_normalized_l2.plot(ax=ax, kind='density') df_normalized_max.plot(ax=ax, kind='density')
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the authorβs guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.