- Lab
- Data

Index Objects with Pandas Hands-on Practice
In this lab, you'll master data manipulation and retrieval using DataFrame operations, various indexing methods including datetime and multi-indexing, and advanced categorization techniques.

Path Info
Table of Contents
-
Challenge
Exploring DataFrames and Indexing in Pandas
Jupyter Guide
To get started, open the file on the right entitled "Step 1...". You'll complete each task for Step 1 in that Jupyter Notebook file. Remember, you must run the cells
(ctrl/cmd + Enter)
for each task before moving onto the next task in the Jupyter Notebook. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.
Exploring DataFrames and Indexing in Pandas
To review the concepts covered in this step, please refer to the Introduction to Indexing Objects in Pandas module of the Index Objects with Pandas course.
Understanding the structure of DataFrames and the concept of indexing in Pandas is important because it forms the foundation for data extraction, manipulation, and modification. This step will allow you to practice the basics of indexing and explore a dataset using position-based indexing.
Let's dive into the world of data analysis with Pandas! In this step, you'll get hands-on experience with the basics of DataFrames and indexing in Pandas. You'll be using the
Learning_Management.csv
dataset to practice extracting data using numerical indexing, and selecting subsets of data using both row and column labels. The goal here is to familiarize yourself with the structure of DataFrames and understand the importance of indexing in data extraction.
Task 1.1: Importing the Pandas Library
Before you can start working with DataFrames, you need to import the pandas library. Import the pandas library as pd.
π Hint
Use the
import
keyword followed by the library name andas
keyword to give it a short alias. For example,import pandas as pd
.π Solution
import pandas as pd
Task 1.2: Loading the Dataset
Load the
Learning_Management.csv
file into a DataFrame using pandas. Name the DataFramedf
.π Hint
Use the
pd.read_csv()
function to read the csv file. Pass the file path as a string to the function. For example,df = pd.read_csv('file_path')
.π Solution
df = pd.read_csv('Learning_Management.csv')
Task 1.3: Inspecting the DataFrame
Inspect the first 5 rows of the DataFrame using the
head()
function.π Hint
Use the
head()
function on the DataFrame to view the first 5 rows. For example,df.head()
.π Solution
df.head()
Task 1.4: Selecting a Single Column
Select the
employee_name
column from the DataFrame.π Hint
Use the column label as an index to select a single column. For example,
df['column_name']
.π Solution
df['employee_name']
Task 1.5: Selecting Multiple Columns
Select the
employee_name
andcourse_name
columns from the DataFrame.π Hint
Use a list of column labels as an index to select multiple columns. For example,
df[['column1', 'column2']]
.π Solution
df[['employee_name', 'course_name']]
Task 1.6: Selecting Rows Using Index
Select the first 10 rows of the DataFrame using numerical indexing.
π Hint
Use the
iloc
property with a slice to select rows. For example,df.iloc[start:end]
. If you want to start at the beginning, leave start empty and include only 'end'. For example,df.iloc[:end]
.π Solution
df.iloc[:10]
Task 1.7: Selecting Subsets of Data
Select the
employee_name
andcourse_name
columns for the first 10 rows of the DataFrame.π Hint
Use the
iloc
property with a slice for rows and a list of column labels for columns. For example,df.iloc[:10][column list]
.π Solution
df.iloc[:10][['employee_name', 'course_name']]
-
Challenge
Working with Time Series Data in Pandas
Working with Time Series Data in Pandas
To review the concepts covered in this step, please refer to the Pandas Index Objects for Time Series Data module of the Index Objects with Pandas course.
Understanding how to use datetime and timedelta indexing in Pandas is important because it allows for efficient handling and manipulation of time-series data. This step will provide you with the opportunity to practice these concepts using a real-world dataset.
Time to tackle time-series data! In this step, you'll explore how to use datetime and timedelta indexing in Pandas to manipulate and extract data. Using the
completion_date
column from theLearning_Management.csv
dataset, you'll practice creating a datetime index, extracting data for specific time periods, and performing basic operations on pandas built-in datetime objects. The goal is to get comfortable with handling time-series data in Pandas.
Task 2.1: Load the Dataset
Start by loading the
Learning_Management.csv
dataset into a pandas DataFrame. Name the DataFramedf
.After loading the data, display the head of the DataFrame to view the first few rows.
π Hint
Use the
pd.read_csv()
function to load the dataset. The file path is 'Learning_Management.csv'.π Solution
import pandas as pd df = pd.read_csv('Learning_Management.csv') df.head()
Task 2.2: Convert the 'completion_date' Column to Datetime
Convert the 'completion_date' column in the DataFrame to a datetime object. This will allow you to perform time-series operations on the data.
After converting, print the data type of the column again to see the change.
π Hint
Use the
pd.to_datetime()
function to convert the 'completion_date' column to datetime. Make sure to assign the result back to the 'completion_date' column in the DataFrame.π Solution
# Provided code print(df['completion_date'].dtype) # Convert the 'completion_date' column to datetime and print column dtype df['completion_date'] = pd.to_datetime(df['completion_date']) print(df['completion_date'].dtype)
Task 2.3: Set the 'completion_date' Column as the DataFrame Index
Set the 'completion_date' column as the index of the DataFrame. This will allow you to use datetime indexing to select data based on the completion date.
After setting the index, display the head of the DataFrame to see the changes.
π Hint
Use the
df.set_index()
method to set the 'completion_date' column as the index. Make sure to assign the result back todf
.π Solution
df = df.set_index('completion_date') df.head()
Task 2.4: Select Data for a Specific Time Period
Select all rows in the DataFrame where the completion date is in May 2022.
After selecting, display the selected data to verify the result.
π Hint
Use the
df.loc[]
indexer to select data for May 2022. The syntax for selecting a specific month is 'YYYY-MM'.π Solution
may_2022_data = df.loc['2022-05'] may_2022_data
Task 2.5: Perform a Basic Operation on a Datetime Object
Calculate the number of days between the earliest and latest completion dates in the DataFrame.
Display the result to see the number of days.
π Hint
Use the
df.index.min()
anddf.index.max()
methods to get the earliest and latest completion dates, respectively. Subtract the earliest date from the latest date to get the number of days between them.π Solution
num_days = df.index.max() - df.index.min() num_days
-
Challenge
Interval, Categorical, and Period Indexing in Pandas
Interval, Categorical, and Period Indexing in Pandas
To review the concepts covered in this step, please refer to the Interval, Categorical, and Period Indexing in Pandas module of the Index Objects with Pandas course.
Understanding how to create and use interval, categorical, and period indices in Pandas is important because these indexing techniques enable advanced data extraction from a DataFrame. This step will allow you to practice creating these indices and using them to extract data from a DataFrame.
Ready to level up your indexing skills? In this step, you'll delve into interval, categorical, and period indexing in Pandas. You'll practice creating these indices using sample data and learn how to use them for efficient data extraction. The goal is to practice these advanced indexing techniques for more efficient data extraction.
Task 3.1: Create an Interval Index
Create an interval index based on a range of values. Use the
pd.cut()
function to divide the range0
to100
into5
equal intervals. This method helps in binning or bucketing the data.π Hint
Use the
pd.cut()
function with a range of values (e.g.,range(0, 101)
) as the first argument and5
as the second argument to create the interval index. This function will return an IntervalIndex which can be used as an index in creating a DataFrame.π Solution
# Provided Code import pandas as pd import numpy as np interval_index = pd.cut(range(0, 101), 5)
Task 3.2: Create a DataFrame Using Interval Index
Use the interval index created in Task 3.1 to create a DataFrame with random data. The DataFrame should have
101
rows and2
columns. Use the provided code to create the random data.π Hint
Use
pd.DataFrame()
withnp.random.randn(101, 2)
to create random data. Use the interval index created in Task 3.1 as the index of the DataFrame.π Solution
# Provided code random_data = np.random.randn(101, 2) df_interval = pd.DataFrame(random_data, index=interval_index, columns=['A', 'B'])
Task 3.3: Index into the Interval Indexed DataFrame
Select the rows from the DataFrame created in Task 3.2 where the interval index includes the value
42
.π Hint
To index into the DataFrame, use the indexer
df_interval.loc[]
with the specific value (e.g.,42
) you want to find within the intervals.π Solution
df_interval.loc[42]
Task 3.4: Create a Categorical Index
Create a categorical index using a list of
4
categories. Categorical data is a Pandas data type corresponding to categorical variables in statistics.π Hint
Create a list of categories and use the
pd.Categorical()
function to create the categorical index. Thepd.Categorical()
function is used for creating array-like objects representing categorical variables.π Solution
categories = ['Category1', 'Category2', 'Category3', 'Category4'] categorical_index = pd.Categorical(categories)
Task 3.5: Create a DataFrame Using Categorical Index
Use the categorical index created in Task 3.4 to create a DataFrame with random data. The DataFrame should have
4
rows and2
columns. Use the provided code to create the random data.π Hint
Use
pd.DataFrame()
withnp.random.randn(4, 2)
to create random data. Use the categorical index created in Task 3.4 as the index of the DataFrame.π Solution
# Provided code random_data = np.random.randn(4, 2) df_categorical = pd.DataFrame(random_data, index=categorical_index, columns=['A', 'B'])
Task 3.6: Index into the Categorical Indexed DataFrame
Select the row from the DataFrame created in Task 3.5 that corresponds to your second category.
π Hint
To index into the DataFrame, use the indexer
df_categorical.loc[]
with the specific category (e.g.,'Category2'
) you want to access.π Solution
df_categorical.loc['Category2']
Task 3.7: Create a Period Index
Create a period index representing each month in
2023
. Period indices are useful for time series data that require to be aggregated or indexed by a particular time period.π Hint
Use the
pd.period_range()
function to create a period index that represents each month in a year. The first argument should be in the format 'YYYY-MM', followed by keyword argumentsperiods=12
andfreq=M
. This function returns a PeriodIndex which can be used to index data in a DataFrame.π Solution
period_index = pd.period_range('2023-01', periods=12, freq='M')
Task 3.8: Create a DataFrame Using Period Index
Use the period index created in Task 3.7 to create a DataFrame with random data. The DataFrame should have 12 rows and 2 columns.
π Hint
Use
pd.DataFrame()
withnp.random.randn(12, 2)
to create random data. Use the period index created in Task 3.7 as the index of the DataFrame.π Solution
# Provided code random_data = np.random.randn(12, 2) df_period = pd.DataFrame(random_data, index=period_index, columns=['A', 'B'])
Task 3.9: Index into the Period Indexed DataFrame
Select the row from the DataFrame created in Task 3.8 that corresponds to the month '2023-05'.
π Hint
To index into the DataFrame, use the indexer
df_period.loc[]
with the specific period (e.g.,'2023-05'
) you want to access.π Solution
df_period.loc['2023-05']
-
Challenge
Multi-indexing in Pandas
Multi-indexing in Pandas
To review the concepts covered in this step, please refer to the Multi-indexing in Pandas module of the Index Objects with Pandas course.
Understanding how to create and use a MultiIndex in Pandas is important because it allows for efficient organization and retrieval of hierarchical data. This step will provide you with the opportunity to practice creating a MultiIndex and using it to retrieve data at different hierarchy levels.
Let's dive into the world of multi-indexing! In this step, you'll learn how to create a MultiIndex for hierarchical data organization in a DataFrame. Using the
Learning_Management.csv
dataset, you'll practice creating a MultiIndex and using it to retrieve data at different hierarchy levels. The goal is to understand the benefits of using MultiIndexing in pandas for hierarchical data organization and efficient data retrieval.
Task 4.1: Importing Pandas
Before we start working with the data, we need to import pandas. In this task, import the pandas library which will be used throughout this step.
π Hint
Use the
import
keyword to import pandas. It's common to import pandas aspd
.π Solution
import pandas as pd
Task 4.2: Loading the Dataset
Now that we have imported pandas, let's load the dataset. The dataset is stored in a CSV file named 'Learning_Management.csv'.
π Hint
Use the
read_csv
function from pandas to load the dataset. The file path is 'Learning_Management.csv'.π Solution
df = pd.read_csv('Learning_Management.csv')
Task 4.3: Creating a MultiIndex
Now that we have loaded the dataset, let's create a MultiIndex. We will use the 'employee_id' and 'course_id' columns as our index. This will allow us to organize our data hierarchically.
After creating the MultiIndex, display the head of the DataFrame to visualize the change.
π Hint
Use the
set_index
function on the dataframe and pass in a list of column names['employee_id', 'course_id']
to create a MultiIndex. Then usedf.head()
to display the first few rows of the DataFrame.π Solution
df.set_index(['employee_id', 'course_id'], inplace=True) df.head() # or # df = df.set_index(['employee_id', 'course_id']) # df.head()
Task 4.4: Retrieving Data Using MultiIndex
With the MultiIndex created, your next task is to retrieve data for a specific course. Find all employees who completed the course with courseid
'C002'
.π Hint
Use the
xs
(cross-section) function on the DataFrame to retrieve data for a specific 'course_id'. You'll need to specify the course id (e.g., 'C002') and the level ('course_id') at which to perform the cross-section.π Solution
df.xs('C002', level='course_id')
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the authorβs guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.