Libraries: If you want this lab, consider one of these libraries.
Data

Up and Running with Pandas Hands-on Practice

In this lab, you'll dive into using pandas for data manipulation and analysis. The lab starts with creating and manipulating DataFrames, where you'll learn to create DataFrames from dictionaries, combine them, reset indexes, and fill missing values. Next, you'll work with JSON and CSV files, focusing on loading data from these formats into DataFrames, examining DataFrame metadata, and converting DataFrames back to JSON and CSV. Finally, the lab explores data evaluation, including displaying data, getting statistical properties and correlation scores, filtering data based on conditions, and determining the size of the DataFrame.

Get started Contact sales

Lab Info

Level

Beginner

Last updated

Dec 14, 2025

Duration

26m

Challenge

Creating and Manipulating DataFrames
Creating and Manipulating DataFrames

To review the concepts covered in this step, please refer to the Understanding Data Fundamentals with Pandas module of the Up and Running with Pandas course.

Understanding DataFrames is important because they are a fundamental part of pandas and are used to store and manipulate tabular data.

In this step, you will create a DataFrame from scratch and learn how to combine multiple DataFrames together. You will also practice resetting the row index of a DataFrame and filling in missing values with NaN values. Use the pd.DataFrame() function to create a DataFrame and the pd.concat() function to combine DataFrames.

After the successful completion of each task, proceed to execute the Jupyter Notebook cell by using the Shift + Enter key combination to enact any necessary changes.

Task 1.1: Creating a DataFrame

Import pandas and then create two DataFrames from the provided lists of dictionaries. Name the first DataFrame df1 and the second df2.

Data for df1:
```
[
    {'name': 'Alice', 'age': None, 'city': 'New York'},
    {'name': 'Bob', 'age': 26, 'city': 'Los Angeles'},
    {'name': 'Oliver', 'age': None, 'city': 'Salt Lake'}
]
```
Data for df2:
```
[
    {'name': 'Charlie', 'age': 35, 'city': 'Chicago'},
    {'name': 'Diana', 'age': 82, 'city': 'Miami'}
]
```
Use the pd.DataFrame() function to create each DataFrame. Print both DataFrames.

🔍 Hint

To create a DataFrame, call the pd.DataFrame() function and pass the list of dictionaries as an argument.

**Context:** In pandas, a DataFrame is a fundamental structure for storing and manipulating tabular data. It can be created from various data formats. Using lists of dictionaries is a straightforward method, where each dictionary represents a row in the DataFrame, and the keys correspond to column names.
🔑 Solution

import pandas as pd # Create DataFrames df1 = pd.DataFrame([ {'name': 'Alice', 'age': None, 'city': 'New York'}, {'name': 'Bob', 'age': 26, 'city': 'Los Angeles'}, {'name': 'Oliver', 'age': None, 'city': 'Salt Lake'} ]) df2 = pd.DataFrame([ {'name': 'Charlie', 'age': 35, 'city': 'Chicago'}, {'name': 'Diana', 'age': 82, 'city': 'Miami'} ]) # Display DataFrames df1, df2
Task 1.2: Combine DataFrames

Now, combine df1 and df2 into a single DataFrame named combined_df. Display the result.

🔍 Hint

Use the pd.concat() function and pass a list containing df1 and df2.

**Context:** Concatenating in pandas merges two or more DataFrames either vertically, adding rows, or horizontally, adding columns. The `pd.concat()` function facilitates this operation, aligning data by index labels to ensure consistency. This process is essential for aggregating and comparing data from different sources.
🔑 Solution

# Combine DataFrames combined_df = pd.concat([df1, df2]) # Display the combined DataFrame combined_df
Task 1.3: Reset the Index

Reset the row index of combined_df to ensure it's continuous and sequential.

🔍 Hint

Use the reset_index() method with the drop=True parameter to reset the index without keeping the old index. Optionally, use the inplace=True to mutate the existing DataFrame instead of creating a new DataFrame.

**Context:** Resetting the row index of a DataFrame renumbers the rows from zero and can turn the old index into a column. This is often done after data manipulation operations like sorting or filtering, which may leave gaps in the original index sequence. The `reset_index()` function in pandas makes this adjustment, helping to maintain a continuous, sequential index.
🔑 Solution

# Reset the index combined_df = combined_df.reset_index(drop=True) # or # combined_df.reset_index(drop=True, inplace=True) # Display the new DataFrame combined_df
Task 1.4: Fill in Missing Values

The 'age' column in your concatenated DataFrames has some missing values. Fill the missing values in the 'age' column with the median age.

🔍 Hint

First, calculate the median of the 'age' column using the median() method. Then, use the fillna() method on the df['age'] series, passing in the median age.

**Context:** In datasets, missing numerical values are often filled with a measure of central tendency, like the mean or median. The median is robust to outliers and is a better choice when the data distribution is skewed. Using the median ensures that the filled values are typical for the dataset without being affected by extreme values.
🔑 Solution

# Calculate the median of the 'age' column median_age = combined_df['age'].median() # Fill in missing values in the 'age' column with the median age combined_df['age'].fillna(value=median_age, inplace=True) # Display the DataFrame combined_df
Challenge

Working with JSON and CSV Files
Working with JSON and CSV Files

To review the concepts covered in this step, please refer to the Programmatically Representing Data with Pandas module of the Up and Running with Pandas course.

Knowing how to work with JSON and CSV files is important because these are common data formats that you will encounter in data analysis.

In this step, you will practice converting JSON and CSV files into DataFrames and vice versa. You will also learn how to examine DataFrame metadata and make minimal transformations to the data. Use the pd.read_json() and pd.read_csv() functions to read JSON and CSV files, respectively, and the to_json() and to_csv() methods to convert DataFrames back to these formats.

Task 2.1: Load the CSV file into a DataFrame

Import pandas and use the pd.read_csv() function to load the CSV file 'Student_Scores.csv' into a DataFrame. Name the DataFrame student_scores.

🔍 Hint

Use the pd.read_csv() function and pass the name of the csv file through as a string.
🔑 Solution

import pandas as pd # Load the CSV file into a DataFrame student_scores = pd.read_csv('Student_Scores.csv')
Task 2.2: Examine the DataFrame

Use the head() function to display the first 5 rows of the DataFrame. Then, display the metadata of the DataFrame using the info() method.

🔍 Hint

Call the head() and info() methods on the student_scores DataFrame.

**Context:** The `head()` function allows analysts and data scientists to swiftly inspect the first few rows of a DataFrame, providing an immediate snapshot of its structure and contents. Typically, the `head()` function displays the initial five rows, but this can be customized as needed. In contrast, the `info()` function furnishes vital metadata about the DataFrame, including data types, non-null counts, and memory utilization.
🔑 Solution

# Display the first 5 rows of the DataFrame student_scores.head()

# Display the metadata of the DataFrame student_scores.info()
Task 2.3: Convert the DataFrame to a JSON file

Convert the previous DataFrame created in Task 2.2, student_scores, to a JSON file.

🔍 Hint

Convert the DataFrame to a Json file using the to_json() function.
🔑 Solution

# Convert the DataFrame to a JSON file student_scores.to_json('student_scores.json')
Task 2.4: Load the JSON file into a DataFrame

Load the JSON file from the previous Task 2.3, student_scores.json, into a DataFrame called student_scores_from_json.

🔍 Hint

Use the pd.read_json() function and pass the name of the JSON file through as a string.
🔑 Solution

# Load the JSON file into a DataFrame student_scores_from_json = pd.read_json('student_scores.json')
Task 2.5: Compare the two DataFrames

Compare and check if the dataframe created in Task 2.1, student_scores, and the DataFrame created in Task 2.3, student_scores_from_json are identical.

🔍 Hint

Call the equals() method on the student_scores DataFrame and pass student_scores_from_json as the argument.

**Context:** The equals() function in pandas is employed to determine if two DataFrames are identical by comparing their shape and values. It returns a Boolean result (True or False) based on whether the DataFrames match or not. This function is indispensable for quality control and data validation, ensuring data consistency and accuracy when comparing or verifying the equality of two datasets in a concise and straightforward manner.
🔑 Solution

# Check if the two DataFrames are identical student_scores.equals(student_scores_from_json)
Challenge

Exploring and Evaluating Data
Exploring and Evaluating Data

To review the concepts covered in this step, please refer to the Exploring and Evaluating Data with Pandas module of the Up and Running with Pandas course.

Exploring and evaluating data is important because it allows you to understand the characteristics and relationships within your data, which is crucial for any data analysis task.

In this step, you will practice deriving different properties from a DataFrame, filtering data according to column names, and determining the size of a DataFrame. You will also learn how to interpret correlation scores and get statistical properties of numerical columns. Use the describe() and corr() methods to get statistical properties and correlation scores, respectively.

Task 3.1: Load the Data

Import pandas and load the 'Student_Scores.csv' data into a pandas DataFrame called 'student_scores'.

🔍 Hint

Use the pd.read_csv() function to load the data. The file is 'Student_Scores.csv'.
🔑 Solution

import pandas as pd # Load the data student_scores = pd.read_csv('Student_Scores.csv')
Task 3.2: Explore the Data

Use the head() function to display the first 5 rows of the DataFrame 'student_scores'.

🔍 Hint

Use the head() function on the DataFrame student_scores.

> **Tip:** To add more than the default 5 rows, pass through a number of your amount. `df.head(#)`
🔑 Solution

# Display the first 5 rows of the DataFrame student_scores.head()
Task 3.3: Get Statistical Properties

Get statistical properties of the numerical columns in the DataFrame 'student_scores'.

🔍 Hint

Use the describe() function on the DataFrame student_scores to view the summary.

**Context:** The `describe()` function in pandas generates summary statistics for numeric columns in a DataFrame, including measures like mean, standard deviation, and quartiles. It offers a quick overview of the central tendencies and distribution characteristics of the data, aiding in initial data exploration and analysis.
🔑 Solution

# Get statistical properties student_scores.describe()
Task 3.4: Get Correlation Scores

Find the correlation scores between numerical columns in the DataFrame 'student_scores'.

🔍 Hint

Use the corr() function on the DataFrame student_scores.

**Context:** The `corr()` function in pandas calculates the pairwise correlation coefficients between numeric columns in a DataFrame, measuring the strength and direction of linear relationships. This function is valuable for uncovering relationships among variables in data analysis and decision-making processes.
🔑 Solution

# Get correlation scores student_scores.corr()
Task 3.5: Filter Data

Filter the DataFrame to only include rows where the data in the 'math_score' column is greater than 90 and save the data into a new DataFrame called 'high_math_scores'. Print the number of students with a math score greater than 90.

🔍 Hint

To create 'high_math_scores', find rows where 'math_score' is greater than 90 in your DataFrame. Use a condition like ['math_score'] > 90 to create a filter, and then apply this filter to your DataFrame to extract the desired rows into 'high_math_scores'. The get the number of students in the high_math_scores dataframe, print it's length with the len() function.
🔑 Solution

# Filter data high_math_scores = student_scores[student_scores['math_score'] > 90] print(len(high_math_scores), "students have score > 90")
Task 3.6: Determine DataFrame Size

Determine the size of the student scores DataFrame using the shape attribute.

🔍 Hint

Use the shape attribute on the DataFrame student_scores.

**Context:** The `shape` attribute in pandas returns a tuple representing the dimensions of a DataFrame. It provides two values: the number of rows and the number of columns in the DataFrame. This attribute is a quick and convenient way to ascertain the size and structure of your dataset, helping you understand its extent and organization at a glance.
🔑 Solution

# Determine DataFrame size student_scores.shape

About the author

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Up and Running with Pandas Hands-on Practice

Lab Info

Table of Contents

Creating and Manipulating DataFrames

Creating and Manipulating DataFrames

Task 1.1: Creating a DataFrame

Task 1.2: Combine DataFrames

Task 1.3: Reset the Index

Task 1.4: Fill in Missing Values

Working with JSON and CSV Files

Working with JSON and CSV Files

Task 2.1: Load the CSV file into a DataFrame

Task 2.2: Examine the DataFrame

Task 2.3: Convert the DataFrame to a JSON file

Task 2.4: Load the JSON file into a DataFrame

Task 2.5: Compare the two DataFrames

Exploring and Evaluating Data

Exploring and Evaluating Data

Task 3.1: Load the Data

Task 3.2: Explore the Data

Task 3.3: Get Statistical Properties

Task 3.4: Get Correlation Scores

Task 3.5: Filter Data

Task 3.6: Determine DataFrame Size

About the author

Real skill practice before real-world application

Learn by doing

Follow your guide

Turn time into mastery

Get started with Pluralsight