- Lab
- Data

Operations on Arrays with NumPy Hands-on Practice
In this lab, you will learn to manipulate a .CSV file using slicing, indexing, and Boolean masks. You will continue with exploring broadcasting, how to handle missing values, perform arithmetic operations, and explore searching and sorting techniques. Essential for data analysis proficiency.

Path Info
Table of Contents
-
Challenge
Slicing and Indexing NumPy Arrays
Slicing and Indexing NumPy Arrays
To review the concepts covered in this step, please refer to the Slicing and Indexing NumPy Arrays module of the Operations on Arrays with NumPy course.
Understanding how to slice and index NumPy arrays is important because it allows us to access and manipulate specific parts of the data stored in the arrays. This is a fundamental operation in data analysis and manipulation.
Let's put our slicing and indexing skills to the test! In this step, we will load the
Consumer_Data.csv
file into a NumPy array and practice accessing specific elements, rows, and columns. We will also explore how to useBoolean masks
to filter the data. Remember, slicing and indexing are powerful tools that allow us to access and manipulate data in NumPy arrays. The goal is to practice these operations and understand how they work. We will be using thenumpy
library to perform these operations.
Task 1.1: Load the Data into a NumPy Array
First, we need to load the
Consumer_Data.csv
file into a NumPy array using the provided code in the first cell. After running the first cell, write code in the second cell to display the array and get a feel for the data.π Hint
To display a value in jupyter either:
- Write the value on the last line of code in the cell. For example:
# Other code above value_to_display
- Use the
print()
function with the value as the argument.
π Solution
Cell 1 ```python # Provided Code import numpy as npLoad the data into a NumPy array
consumer_data = np.genfromtxt( 'Consumer Data.csv', delimiter=',', skip_header=1, )
Cell 2 ```python # Display the array print(consumer_data)
Task 1.2: Access Specific Elements
Now that we have our data loaded into a NumPy array, let's practice accessing specific elements. Access the element at the
4th row
and3rd column
of the array. Display the results.π Hint
Remember that in Python, indexing starts from 0. To access the element in the 4th row and 3rd column, you'll need to use indices 3 and 2, respectively.
π Solution
# Access the element at the 4th row and 3rd column consumer_data[3, 2]
Task 1.3: Slice Rows and Columns
Next, let's practice slicing rows and columns. Slice the first 10 rows and the first 3 columns of the array. Display the results.
π Hint
To slice the first 10 rows and the first 3 columns, use the syntax
array[row slice, column slice]
. Remember that numpy starts it's indexing from 0 and that the endpoint for indexes is excluded, so array[:10] would omit the 10th row, and include rows 0-9 if using numpy indexing.π Solution
# Slice the first 10 rows and the first 3 columns consumer_data[:10, :3]
Task 1.4: Use Boolean Masks to Filter Data
Finally, let's experiment with Boolean masks! Create a
Boolean mask
that selects rows where the age column (2nd column) is greater than 30, and apply this mask to the array. Display the results.π Hint
To create a Boolean mask for rows where the age column is greater than 30, you would compare the age column with 30 using a greater than (
>
) operator. Remember that the age column is the second column, which has an index of 1.π Solution
# Create a Boolean mask mask = consumer_data[:, 1] > 30 # Apply the mask to the array print(consumer_data[mask])
- Write the value on the last line of code in the cell. For example:
-
Challenge
Broadcasting in NumPy
Broadcasting in NumPy
To review the concepts covered in this step, please refer to the Slicing and Indexing NumPy Arrays module of the Operations on Arrays with NumPy course.
Broadcasting in NumPy is a powerful mechanism that allows NumPy to work with arrays of different shapes when performing arithmetic operations. It enables you to perform element-wise operations on arrays of different sizes and dimensions without explicitly replicating data, which makes it memory efficient and faster. Understanding broadcasting is key to utilizing NumPy's full capabilities.
In this step, we will delve into broadcasting, exploring how it allows for operations between arrays of different shapes and sizes. We will cover several examples to demonstrate the principles of broadcasting and its applications. For this, we will use the
numpy
library.
Task 2.1: Understanding Broadcasting Basics
Let's start with a basic example to understand how broadcasting works in NumPy. Consider an array
A
of shape (3,3) and a integerb
. Perform an addition operation betweenA
andb
, and observe how NumPy handles this operation.Define the array
A
and integerb
as follows:A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) b = 5
π Hint
Remember that in broadcasting, a smaller array (in this case, the integer
b
) is "stretched" to match the shape of the larger arrayA
. This stretching is not actual memory duplication but a conceptual extension to align the dimensions.π Solution
import numpy as np A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) b = 5 result = A + b print(result)
Task 2.2: Broadcasting with One-Dimensional Arrays
Now, let's move to a slightly more complex example. Create a one-dimensional array
v
of length 3 and a two-dimensional arrayM
of shape (3,3). Use broadcasting to addv
to each row ofM
.Define the arrays
v
andM
as follows:v = np.array([1, 0, -1]) M = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
π Hint
When adding
v
(shape(3,)
) toM
(shape(3,3)
), NumPy will treatv
as if it were a(1,3)
array and then stretch it along the first dimension (rows) to match the shape ofM
.π Solution
v = np.array([1, 0, -1]) M = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) result = M + v print(result)
Task 2.3: Broadcasting to Create a Grid
Broadcasting can be used to efficiently create grids. Create a grid of coordinates from two one-dimensional arrays
x
andy
using broadcasting. Thex
array represents the x-coordinates, while they
array represents the y-coordinates.Define
x
andy
as follows:x = np.array([0, 1, 2, 3]) # x-coordinates y = np.array([0, 1, 2]) # y-coordinates
Reshape
x
to(4,1)
and use addition broadcasting to create a grid of shape(4,3)
.π Hint
You need to reshape
x
to(4,1)
so that when it is added toy
, broadcasting can occur. The resulting grid will have shape(4,3)
, combining the x-coordinates along the rows and y-coordinates along the columns.π Solution
x = np.array([0, 1, 2, 3]).reshape(4, 1) y = np.array([0, 1, 2]) grid = x + y print(grid)
Task 2.4: Understanding Broadcasting Rules and Dimension Alignment
In this final task, let's explore how NumPy's broadcasting rules apply to dimension alignment. When broadcasting two arrays, NumPy compares their shapes element-wise, starting from the trailing dimensions and working its way forward. If the dimensions are not aligned, NumPy will attempt to prepend a 1 to the shape of the array with fewer dimensions. For two arrays to be compatible for broadcasting, each dimension must either be the same or one of them must be 1.
In this example, try to add two arrays of shapes (3,2) and (3,) respectively, and observe how NumPy handles this operation.
Define the arrays
A
andv
as follows:A = np.array([[1, 2], [3, 4], [5, 6]]) v = np.array([1, 2, 3])
You should expect a
ValueError
as a result of this task, because the shapes do not align for broadcasting.π Hint
When comparing the shapes
(3,2)
and(3,)
, NumPy prepends a 1 to the shape ofv
, making it(1,3)
. Now, the shapes are(3,2)
and(1,3)
. The last dimensions (2 and 3) are neither equal nor is one of them 1, so the arrays are not compatible for broadcasting.π Solution
A = np.array([[1, 2], [3, 4], [5, 6]]) v = np.array([1, 2, 3]) try: result = A + v except ValueError as e: print(f"Error: {e}")
-
Challenge
Performing Arithmetic Operations with NumPy Arrays
Performing Arithmetic Operations with NumPy Arrays
To review the concepts covered in this step, please refer to the Operating with NumPy Arrays module of the Operations on Arrays with NumPy course.
Understanding how to perform arithmetic operations on NumPy arrays is important because it allows us to manipulate and analyze the data stored in the arrays. This is a fundamental operation in data analysis and manipulation.
In this step, we will practice performing arithmetic operations on our
Consumer_Data.csv
NumPy array. We will explore how to perform basic operations like addition, subtraction, multiplication, and division, as well as more complex operations like calculating the mean and standard deviation. The goal is to understand how these operations work and how they can be used to manipulate and analyze data. We will be using thenumpy
library for this step.
Task 3.1: Load the Data into a NumPy Array
First, we need to load the
Consumer_Data.csv
file into a NumPy array using the provided code in the first cell. After running the first cell, write code in the second cell to display the first 5 rows of the array and get a feel for the data.π Hint
To display a value in jupyter either:
- Write the value on the last line of code in the cell. For example:
# Other code above value_to_display
- Use the
print()
function with the value as the argument.
π Solution
Cell 1 ```python # Provided Code import numpy as npLoad the data into a NumPy array
consumer_data = np.genfromtxt( 'Consumer_Data.csv', delimiter=',', skip_header=1, )
Cell 2 ```python # Display the first 5 rows of the array print(consumer_data[:5])
Task 3.2: Handling Missing Data
It's common to encounter missing values (NaNs) in datasets. Before performing arithmetic operations, it's important to handle these missing values. Replace all NaN values with
0
using boolean indexing. Display the first 5 rows of the resulting array.π Hint
You can use the
numpy.isnan
function to identify NaNs and then replace them with a value of your choice, such as the mean or median of the column. For example:mask = np.isnan(array)
π Solution
mask = np.isnan(consumer_data) consumer_data[mask] = 0 print(consumer_data[:5])
Task 3.3: Performing Basic Arithmetic Operations
Now that we have our data loaded into a NumPy array and cleaned, we can start performing arithmetic operations on it.
Let's start with some basic operations like addition and subtraction. For this task, add
5
to every element in theage
column (2nd column) and subtract2
from every element in thecustomer_id
column (1st column). Save the results back to the original array (in place). Display the first 5 rows of the modified array.π Hint
To modify a slice in place, first index into the array to get the desired slice. For simple operations, for example, addition and subtraction,
+=
and-=
assignment operators will modify the slice in place.π Solution
consumer_data[:, 1] += 5 consumer_data[:, 0] -= 2 print(consumer_data[:5])
Task 3.4: Performing Complex Arithmetic Operations
In addition to basic operations, NumPy also allows us to perform more complex operations like calculating the mean and standard deviation. For this task, calculate the mean and standard deviation of the
age
column (2nd column). Print the results.π Hint
You can use the
numpy.mean()
andnumpy.std()
functions to calculate mean and standard deviation. First index into the desired column, then apply the complex operation on the slice.π Solution
mean_age = np.mean(consumer_data[:, 1]) std_dev_age = np.std(consumer_data[:, 1]) print("mean:", mean_age) print("std:", std_dev_age)
- Write the value on the last line of code in the cell. For example:
-
Challenge
Searching and Sorting in NumPy Arrays
Searching and Sorting in NumPy Arrays
To review the concepts covered in this step, please refer to the Searching in NumPy Arrays module of the Operations on Arrays with NumPy course.
Searching and sorting in NumPy arrays is important because it allows us to organize and find specific elements in the data. This is a key operation in data analysis and manipulation.
In this final step, we will practice searching and sorting our
Consumer_Data.csv
NumPy array. We will explore how to use thenp.sort
andnp.where
functions to sort and search for elements in the array. In an effort to provide tailored financial services, a bank is undertaking a project to suggest credit cards to their most reliable customers. This decision is based on their credit scores, a key indicator of financial reliability. Your goal is to identify these customers.
Task 4.1: Load the Data into a NumPy Array
Our first task is to load the customer data into a NumPy array. This process will transform the raw data from
Consumer_Data.csv
into a structured format that's easy to manipulate with Python. Once loaded with the provided code, preview the first rows of the data to understand its structure and content.π Hint
To display a value in Jupyter either:
- Write the value on the last line of code in the cell. For example:
# Other code above value_to_display
- Use the
print()
function with the value as the argument.
π Solution
Cell 1
# Provided Code import numpy as np # Load the data into a NumPy array consumer_data = np.genfromtxt( 'Consumer_Data.csv', delimiter=',', skip_header=1, )
Cell 2
# Display the first 5 rows of the array print(consumer_data[:5])
Task 4.2: Sorting the Array
With the data loaded, our next task is to sort it based on the
Credit_Score
column (4th column). This step is crucial as it will allow us to easily identify customers with high credit scores. Display the sorted credit scores.π Hint
Use the
np.sort()
function to sort the array. The sliced array (consumer_data[:, 3]
) should be the argument to this function. Remember that Python uses 0-based indexing.π Solution
# Sorting the data by the Credit_Score column sorted_data = np.sort(consumer_data[:, 3]) # 4th column for Credit_Score print(sorted_data)
Task 4.3: Searching the Array
Now that we have sorted our data, lets search for bank customers with a credit score above 700. Identifying these customers will help us target the right individuals for our credit card offers. Once you find which customers have a credit score above 700, print out these customers' entire row of data. There should only be 10 customers with a credit score over 700.
π Hint
Use the
np.where()
function to search for elements in the array. The condition should be passed as an argument to this function. Remember that Python uses 0-based indexing. Use the result to index the original data to be printed.π Solution
# Searching for customers with a credit score above 700 high_credit_customers = np.where(consumer_data[:, 3] > 700) print(consumer_data[high_credit_customers])
- Write the value on the last line of code in the cell. For example:
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the authorβs guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.