- Lab
- Data

Pandas Arrays and Data Structures Hands-on Practice
In this lab, you'll master Pandas arrays and data structures, including creating various data types, handling time data, and manipulating categorical and sparse data. You'll learn essential skills like sorting, filtering, and efficient data structuring, culminating in string data manipulation techniques, equipping you with comprehensive Pandas proficiency.

Path Info
Table of Contents
-
Challenge
Exploring Pandas Arrays and Data Types
Jupyter Guide
To get started, open the file on the right entitled "Step 1...". You'll complete each task for Step 1 in that Jupyter Notebook file. Remember, you must run the cells
(ctrl/cmd(β) + Enter)
for each task before moving onto the next task in the Jupyter Notebook. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.
Exploring Pandas Arrays and Data Types
To review the concepts covered in this step, please refer to the Pandas Arrays and Data Types module of the Pandas Arrays and Data Structures course.
Understanding Pandas Arrays and their data types is important because they form the core of data manipulation and analysis in Pandas. The ability to create and manipulate arrays, and understand their characteristics, is key to efficient data work.
In this step, you will dive deep into the world of Pandas Arrays and their data types. Your goal is to create Pandas arrays with various data types, check their types, and compare the results. You will use the
dtype
attribute and parameter to create arrays as specific types and understand how Pandas infers the best data type whendtype
is not specified.
Task 1.1: Creating a Pandas Array with Integer Data Type
Create a Pandas array with integer data type using the
dtype
parameter. After creating the array, print it out to display its contents and data type.π Hint
Use the
pd.array()
function to create a Pandas array. Pass a list of integers as the first argument and 'int' as thedtype
parameter. Use theprint()
function to display the array and its data type usingarray.dtype
.π Solution
import pandas as pd # Create a pandas array with integer data type array_int = pd.array([1, 2, 3, 4, 5], dtype='int') # Print the array and its data type print("Array:", array_int) print("Data Type:", array_int.dtype)
Task 1.2: Creating a Pandas Array with String Data Type
Create a Pandas array with string data type using the
dtype
parameter. After creating the array, print it out to display its contents and data type.π Hint
Use the
pd.array()
function to create a Pandas array. Pass a list of strings as the first argument and 'str' as thedtype
parameter. Use theprint()
function to display the array and its data type usingarray.dtype
.π Solution
# Create a pandas array with string data type array_str = pd.array(['a', 'b', 'c', 'd', 'e'], dtype='str') # Print the array and its data type print("Array:", array_str) print("Data Type:", array_str.dtype)
Task 1.3: Creating a Pandas Array with Mixed Data Type
Create a Pandas array with mixed data type and let Pandas infer the best data type. After creating the array, print it out to display its contents and data type.
π Hint
Use the
pd.array()
function to create a Pandas array. Pass a list with mixed data types (integers and strings) as the first argument. Use theprint()
function to display the array and its data type usingarray.dtype
.π Solution
# Create a pandas array with mixed data type array_mixed = pd.array([1, 'b', 3, 'd', 5]) # Print the array and its data type print("Array:", array_mixed) print("Data Type:", array_mixed.dtype)
Task 1.4: Creating a Mixed Pandas Array with Explicit Object dtype
Create a Pandas array with mixed data type (integers and strings) and explicitly set the data type to 'object'. After creating the array, print it out to display its contents and data type.
π Hint
Use the
pd.array()
function to create a Pandas array. Pass a list with mixed data types (integers and strings) as the first argument and 'object' as thedtype
parameter. Use theprint()
function to display the array and its data type usingarray.dtype
.π Solution
# Create a pandas array with mixed data type and dtype='object' array_mixed_object = pd.array([1, 'b', 3, 'd', 5], dtype='object') # Print the array and its data type print("Array:", array_mixed_object) print("Data Type:", array_mixed_object.dtype)
-
Challenge
Working with Time in Pandas
Working with Time in Pandas
To review the concepts covered in this step, please refer to the Pandas Arrays and Data Types module of the Pandas Arrays and Data Structures course.
Understanding how to handle time-related data in Pandas is important because time is a common dimension in many datasets. You will often need to manipulate and compare timestamps, timedeltas, and intervals.
In this step, you will learn how to handle date and time operations in Pandas. You will create Timestamp objects, convert them to different time zones. You will also learn how to create and manipulate Time Delta and Interval objects from Timestamps, checking if values are inside an interval, if two intervals overlap, and shifting and extending intervals.
Task 2.1: Creating Timestamps
Create a Timestamp object for the current date and time using the
now()
method. Save the result as the variablenow
, and then print it out to see the result.π Hint
Use the
pd.Timestamp.now()
function without any arguments to get the current date and time, and then use theprint()
function to display the result.π Solution
import pandas as pd # Create a Timestamp object for the current date and time now = pd.Timestamp.now() print("Current Timestamp:", now)
Task 2.2: Converting Timezones
Convert the
now
Timestamp object to the 'Asia/Kolkata' timezone, and then print the converted Timestamp.π Hint
Use the
tz_localize
method of the Timestamp object and pass the timezone string 'Asia/Kolkata' as an argument. Then use theprint()
function to display the converted Timestamp.π Solution
# Convert the Timestamp object to the 'Asia/Kolkata' timezone now_kolkata = now.tz_localize('Asia/Kolkata') print("Timestamp in Asia/Kolkata timezone:", now_kolkata)
Task 2.3: Creating Time Delta
Create a Time Delta object representing a duration of 1 day. Save the value in a variable named
delta
, and then print it out.π Hint
Use the
pd.Timedelta
function and pass the string '1 day' as an argument. Then use theprint()
function to display the Time Delta object.π Solution
# Create a Time Delta object representing a duration of 1 day delta = pd.Timedelta('1 day') print("Time Delta (1 day):", delta)
Task 2.4: Manipulating Timestamps with Time Delta
Add the Time Delta object to the Timestamp object to get a new Timestamp object representing the next day. Save the value to
tomorrow
, and then print it.π Hint
Use the
+
operator to add the Time Delta object to the Timestamp object. Then use theprint()
function to display the new Timestamp.π Solution
# Add the Time Delta object to the Timestamp object tomorrow = now + delta print("Timestamp for the next day:", tomorrow)
Task 2.5: Creating Interval
Create an Interval object representing the interval between the original Timestamp and the new Timestamp, and then print it. You can explore the attributes of the interval, like
.length
or.mid
for the midpoint.π Hint
Use the
pd.Interval
function and pass the original Timestamp and the new Timestamp as arguments. Then use theprint()
function to display the Interval.π Solution
# Create an Interval object interval = pd.Interval(now, tomorrow) print("Interval:", interval) print("Length:", interval.length) print("Midpoint:", interval.mid)
Task 2.6: Checking if a Value is Inside an Interval
Check if the current date and time is inside the Interval object, and then print the result. Since the interval excludes the left endpoint and includes the right endpoint,
now
should not be in the interval.π Hint
Use the
in
keyword to check if now is in the interval. Then use theprint()
function to display the result.π Solution
# Check if the current date and time is inside the Interval object print("Is current time inside the interval?", now in interval)
Task 2.7: Checking if Two Intervals Overlap
Create another Interval object for the next two days (
now
tonow + 2 * delta
) and check if it overlaps with the original Interval object, then print the result.π Hint
Use the
pd.Interval
function to create another Interval object for the next day. Then use theoverlaps
method of the original Interval object and pass the new Interval object as an argument. Finally, use theprint()
function to display the result.π Solution
# Create another Interval object next_day = pd.Interval(now, now + 2 * delta) # Check if it overlaps with the original Interval object overlap = interval.overlaps(next_day) print("Do the intervals overlap?", overlap)
Task 2.8: Shifting Intervals
Shift the original Interval object by the Time Delta object, then print the resulting Interval.
π Hint
Use
+
to add the Time Delta object,delta
to the interval. Finally, use theprint()
function to display the result.π Solution
# Shift the original Interval object shifted = interval + delta print("Shifted Interval:", shifted)
-
Challenge
Manipulating Categorical and Sparse Data
Manipulating Categorical and Sparse Data
To review the concepts covered in this step, please refer to the Manipulating Data with Pandas module of the Pandas Arrays and Data Structures course.
Understanding how to manipulate categorical and sparse data is important because these types of data are common in real-world datasets. Categorical data can be used for filtering and sorting operations, while sparse data can make your code more efficient when there are lots of empty values.
In this step, you will learn how to manipulate categorical and sparse data in Pandas. You will create an ordered Categorical data series from a pandas Series, and use it for sorting and filtering operations. You will also learn how to work with sparse data, creating a sparse array, converting an existing DataFrame to use a sparse array, and understanding the structure of a sparse array.
Task 3.1: Creating an Ordered Categorical Data Series
Create an ordered Categorical data series from a pandas Series. The series contains the following data:
['low', 'high', 'medium', 'high', 'low', 'medium', 'medium']
The categories should be ordered as follows:
['low', 'medium', 'high']
. After creating the series, print it to see the ordered categories.π Hint
Use the
pd.Categorical
function to convert the series to a Categorical data series. Pass in the series and the categories you want in ascending order. Set theordered
parameter toTrue
. Then useprint()
to display the series.π Solution
import pandas as pd # Provided pandas Series series = pd.Series(['low', 'high', 'medium', 'high', 'low', 'medium', 'medium']) # Convert the series to an ordered Categorical data series ordered_series = pd.Categorical(series, categories=['low', 'medium', 'high'], ordered=True) # Print the ordered series print("Ordered Categorical Series:", ordered_series)
Task 3.2: Sorting and Filtering Operations
Perform sorting and filtering operations on the ordered Categorical data series. First, sort the series in ascending order, and then print it. Next, filter the series to only include 'medium' and 'high' categories, and then print the filtered series.
π Hint
Use the
sort_values
method to sort the series. Use boolean indexing to filter the series. Useprint()
to display the results of each operation.π Solution
# Sort the series in ascending order sorted_series = ordered_series.sort_values() print("Sorted Series:", sorted_series) # Filter the series to only include 'medium' and 'high' categories filtered_series = sorted_series[(sorted_series == 'medium') | (sorted_series == 'high')] print("Filtered Series:", filtered_series)
Task 3.3: Working with Sparse Data
Create a sparse array with the
pd.arrays.SparseArray
function. The array should contain the following data:[0, 0, 0, 3, 0, 0, 5, 0, 0, 0]
. After creating the sparse array, print it to understand its structure.π Hint
Use the
pd.arrays.SparseArray
function to create a sparse array. Pass in the data as a list. Then useprint()
to display the sparse array.π Solution
# Create a sparse array sparse_array = pd.arrays.SparseArray([0, 0, 0, 3, 0, 0, 5, 0, 0, 0]) # Print the sparse array print("Sparse Array:", sparse_array)
Task 3.4: Converting a DataFrame to Use a Sparse Array
Create a DataFrame with the provided data:
{ 'A': [0, 0, 0, 3, 0, 0, 5, 0, 0, 0], 'B': [1, 0, 0, 0, 2, 0, 0, 0, 3, 0], }
Then, convert the DataFrame to use a sparse array, and print the resulting DataFrame and the
DataFrame.dtypes
attribute for its column dtypes. It should look like a normal DataFrame, but the columns should be typeSparse[float64, nan]
.π Hint
Use the
df.astype
method to convert the DataFrame to use a sparse array. Pass inpd.SparseDtype()
as the argument. Then useprint()
to display the sparse DataFrame.π Solution
import pandas as pd # Create a DataFrame df = pd.DataFrame({'A': [0, 0, 0, 3, 0, 0, 5, 0, 0, 0], 'B': [1, 0, 0, 0, 2, 0, 0, 0, 3, 0]}) # Convert the DataFrame to use a sparse array sparse_df = df.astype(pd.SparseDtype()) # Print the sparse DataFrame and its column structure print("Sparse DataFrame:\n", sparse_df) print("\nSparse DataFrame Structure:\n", sparse_df.dtypes)
-
Challenge
Working with String Data and Finalizing Course Project
Working with String Data and Finalizing Course Project
To review the concepts covered in this step, please refer to the Manipulating Data with Pandas module of the Pandas Arrays and Data Structures course.
Understanding how to manipulate string data is important because strings are a common data type in many datasets. You will often need to replace, split, strip, and concatenate strings.
In this final step, you will learn how to manipulate string data in Pandas. You will use the
replace
,split
,strip
, andcat
methods to manipulate strings, and theexpand
andna_rep
parameters to convert a list to a DataFrame and replaceNaN
values, respectively.
Task 4.1: Replacing String Data
Replace all occurrences of 'apple' with 'pineapple' in the provided pandas Series. Display the resulting series.
π Hint
Use the
Series.str.replace()
method of the pandas Series. The first argument should be the string you want to replace ('apple'), and the second argument should be the string you want to replace it with ('pineapple').π Solution
import pandas as pd # Provided pandas Series series = pd.Series(['apple', 'banana-pie', 'cherry', 'apple-crumble', 'cherry', 'apple-candy']) # Replace all occurrences of 'apple' with 'pineapple' series = series.str.replace('apple', 'pineapple') print("Pineapple instead of Apple:", series)
Task 4.2: Splitting String Data
Split all strings in the pandas Series where there is a
'-'
character. Split those entries into lists of multiple strings. Display the results.π Hint
Use the
Series.str.split
method of the pandas Series. The argument should be the character you want to split the string on ('-').π Solution
series = series.str.split('-') print("Split Strings:", series)
Task 4.3: Stripping String Data
Remove leading and trailing whitespace from a the provided
whitespace_series
. Display the result.π Hint
Use the
Series.str.strip
method of the pandas Series. This method does not take any arguments.π Solution
# Provided pandas Series whitespace_series = pd.Series([' apple ', ' banana ', ' cherry ']) whitespace_series = whitespace_series.str.strip() print("Whitespace-stripped series:", whitespace_series)
Task 4.4: Concatenating String Data
Concatenate each fruit in the provided series with
' pie'
. Display the results.π Hint
Use the
Series.str.cat
method of the pandas Series. The argument should be a list of strings you want to concatenate with each string in the Series. In this case,[' pie']*3
. Display the results.π Solution
# Provided pandas Series fruit_series = pd.Series(['apple', 'banana', 'cherry']) fruit_series = fruit_series.str.cat([' pie']*3) print("Pie options:", fruit_series)
Task 4.5: Expanding a List to a DataFrame
After splitting strings, you are left with a list of strings. Expanding lists into their own columns in a dataframe may be useful when there is an expected number of splits in a string.
Split the provided
flavor_desert
series on the'-'
character and expand the result into a dataframe. Display the resulting dataframe.π Hint
Use the
Series.str.split
method of the pandas Series with theexpand
parameter set to True.π Solution
# Provided pandas Series flavor_desert = pd.Series(['apple-pie', 'kiwi', 'banana-split', 'cherry-candy', "pineapple-chunks", "dragonfruit"]) df = flavor_desert.str.split('-', expand=True) print("Dataframe of Split Series") df
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the authorβs guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.