Hamburger Icon
  • Labs icon Lab
  • Data
Labs

Operations on Arrays with NumPy Hands-on Practice

In this lab, you will learn to manipulate a .CSV file using slicing, indexing, and Boolean masks. You will continue with exploring broadcasting, how to handle missing values, perform arithmetic operations, and explore searching and sorting techniques. Essential for data analysis proficiency.

Labs

Path Info

Level
Clock icon Beginner
Duration
Clock icon 39m
Published
Clock icon Dec 12, 2023

Contact sales

By filling out this form and clicking submit, you acknowledge ourΒ privacy policy.

Table of Contents

  1. Challenge

    Slicing and Indexing NumPy Arrays

    Slicing and Indexing NumPy Arrays

    To review the concepts covered in this step, please refer to the Slicing and Indexing NumPy Arrays module of the Operations on Arrays with NumPy course.

    Understanding how to slice and index NumPy arrays is important because it allows us to access and manipulate specific parts of the data stored in the arrays. This is a fundamental operation in data analysis and manipulation.

    Let's put our slicing and indexing skills to the test! In this step, we will load the Consumer_Data.csv file into a NumPy array and practice accessing specific elements, rows, and columns. We will also explore how to use Boolean masks to filter the data. Remember, slicing and indexing are powerful tools that allow us to access and manipulate data in NumPy arrays. The goal is to practice these operations and understand how they work. We will be using the numpy library to perform these operations.


    Task 1.1: Load the Data into a NumPy Array

    First, we need to load the Consumer_Data.csv file into a NumPy array using the provided code in the first cell. After running the first cell, write code in the second cell to display the array and get a feel for the data.

    πŸ” Hint

    To display a value in jupyter either:

    1. Write the value on the last line of code in the cell. For example:
      	# Other code above
      	value_to_display
      
    2. Use the print() function with the value as the argument.
    πŸ”‘ Solution Cell 1 ```python # Provided Code import numpy as np

    Load the data into a NumPy array

    consumer_data = np.genfromtxt( 'Consumer Data.csv', delimiter=',', skip_header=1, )

    Cell 2
    ```python
    # Display the array
    print(consumer_data)
    

    Task 1.2: Access Specific Elements

    Now that we have our data loaded into a NumPy array, let's practice accessing specific elements. Access the element at the 4th row and 3rd column of the array. Display the results.

    πŸ” Hint

    Remember that in Python, indexing starts from 0. To access the element in the 4th row and 3rd column, you'll need to use indices 3 and 2, respectively.

    πŸ”‘ Solution
    # Access the element at the 4th row and 3rd column
    consumer_data[3, 2]
    

    Task 1.3: Slice Rows and Columns

    Next, let's practice slicing rows and columns. Slice the first 10 rows and the first 3 columns of the array. Display the results.

    πŸ” Hint

    To slice the first 10 rows and the first 3 columns, use the syntax array[row slice, column slice]. Remember that numpy starts it's indexing from 0 and that the endpoint for indexes is excluded, so array[:10] would omit the 10th row, and include rows 0-9 if using numpy indexing.

    πŸ”‘ Solution
    # Slice the first 10 rows and the first 3 columns
    consumer_data[:10, :3]
    

    Task 1.4: Use Boolean Masks to Filter Data

    Finally, let's experiment with Boolean masks! Create a Boolean mask that selects rows where the age column (2nd column) is greater than 30, and apply this mask to the array. Display the results.

    πŸ” Hint

    To create a Boolean mask for rows where the age column is greater than 30, you would compare the age column with 30 using a greater than (>) operator. Remember that the age column is the second column, which has an index of 1.

    πŸ”‘ Solution
    # Create a Boolean mask
    mask = consumer_data[:, 1] > 30
    
    # Apply the mask to the array
    print(consumer_data[mask])
    
  2. Challenge

    Broadcasting in NumPy

    Broadcasting in NumPy

    To review the concepts covered in this step, please refer to the Slicing and Indexing NumPy Arrays module of the Operations on Arrays with NumPy course.

    Broadcasting in NumPy is a powerful mechanism that allows NumPy to work with arrays of different shapes when performing arithmetic operations. It enables you to perform element-wise operations on arrays of different sizes and dimensions without explicitly replicating data, which makes it memory efficient and faster. Understanding broadcasting is key to utilizing NumPy's full capabilities.

    In this step, we will delve into broadcasting, exploring how it allows for operations between arrays of different shapes and sizes. We will cover several examples to demonstrate the principles of broadcasting and its applications. For this, we will use the numpy library.


    Task 2.1: Understanding Broadcasting Basics

    Let's start with a basic example to understand how broadcasting works in NumPy. Consider an array A of shape (3,3) and a integer b. Perform an addition operation between A and b, and observe how NumPy handles this operation.

    Define the array A and integer b as follows:

    A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    b = 5
    
    πŸ” Hint

    Remember that in broadcasting, a smaller array (in this case, the integer b) is "stretched" to match the shape of the larger array A. This stretching is not actual memory duplication but a conceptual extension to align the dimensions.

    πŸ”‘ Solution
    import numpy as np
    
    A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    b = 5
    
    result = A + b
    print(result)
    

    Task 2.2: Broadcasting with One-Dimensional Arrays

    Now, let's move to a slightly more complex example. Create a one-dimensional array v of length 3 and a two-dimensional array M of shape (3,3). Use broadcasting to add v to each row of M.

    Define the arrays v and M as follows:

    v = np.array([1, 0, -1])
    M = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
    
    πŸ” Hint

    When adding v (shape (3,)) to M (shape (3,3)), NumPy will treat v as if it were a (1,3) array and then stretch it along the first dimension (rows) to match the shape of M.

    πŸ”‘ Solution
    v = np.array([1, 0, -1])
    M = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
    
    result = M + v
    print(result)
    

    Task 2.3: Broadcasting to Create a Grid

    Broadcasting can be used to efficiently create grids. Create a grid of coordinates from two one-dimensional arrays x and y using broadcasting. The x array represents the x-coordinates, while the y array represents the y-coordinates.

    Define x and y as follows:

    x = np.array([0, 1, 2, 3])  # x-coordinates
    y = np.array([0, 1, 2])     # y-coordinates
    

    Reshape x to (4,1) and use addition broadcasting to create a grid of shape (4,3).

    πŸ” Hint

    You need to reshape x to (4,1) so that when it is added to y, broadcasting can occur. The resulting grid will have shape (4,3), combining the x-coordinates along the rows and y-coordinates along the columns.

    πŸ”‘ Solution
    x = np.array([0, 1, 2, 3]).reshape(4, 1)
    y = np.array([0, 1, 2])
    
    grid = x + y
    print(grid)
    

    Task 2.4: Understanding Broadcasting Rules and Dimension Alignment

    In this final task, let's explore how NumPy's broadcasting rules apply to dimension alignment. When broadcasting two arrays, NumPy compares their shapes element-wise, starting from the trailing dimensions and working its way forward. If the dimensions are not aligned, NumPy will attempt to prepend a 1 to the shape of the array with fewer dimensions. For two arrays to be compatible for broadcasting, each dimension must either be the same or one of them must be 1.

    In this example, try to add two arrays of shapes (3,2) and (3,) respectively, and observe how NumPy handles this operation.

    Define the arrays A and v as follows:

    A = np.array([[1, 2], [3, 4], [5, 6]])  
    v = np.array([1, 2, 3])                
    

    You should expect a ValueError as a result of this task, because the shapes do not align for broadcasting.

    πŸ” Hint

    When comparing the shapes (3,2) and (3,), NumPy prepends a 1 to the shape of v, making it (1,3). Now, the shapes are (3,2) and (1,3). The last dimensions (2 and 3) are neither equal nor is one of them 1, so the arrays are not compatible for broadcasting.

    πŸ”‘ Solution
    A = np.array([[1, 2], [3, 4], [5, 6]])  
    v = np.array([1, 2, 3])                
    
    try:
        result = A + v
    except ValueError as e:
        print(f"Error: {e}")
    
  3. Challenge

    Performing Arithmetic Operations with NumPy Arrays

    Performing Arithmetic Operations with NumPy Arrays

    To review the concepts covered in this step, please refer to the Operating with NumPy Arrays module of the Operations on Arrays with NumPy course.

    Understanding how to perform arithmetic operations on NumPy arrays is important because it allows us to manipulate and analyze the data stored in the arrays. This is a fundamental operation in data analysis and manipulation.

    In this step, we will practice performing arithmetic operations on our Consumer_Data.csv NumPy array. We will explore how to perform basic operations like addition, subtraction, multiplication, and division, as well as more complex operations like calculating the mean and standard deviation. The goal is to understand how these operations work and how they can be used to manipulate and analyze data. We will be using the numpy library for this step.


    Task 3.1: Load the Data into a NumPy Array

    First, we need to load the Consumer_Data.csv file into a NumPy array using the provided code in the first cell. After running the first cell, write code in the second cell to display the first 5 rows of the array and get a feel for the data.

    πŸ” Hint

    To display a value in jupyter either:

    1. Write the value on the last line of code in the cell. For example:
      	# Other code above
      	value_to_display
      
    2. Use the print() function with the value as the argument.
    πŸ”‘ Solution Cell 1 ```python # Provided Code import numpy as np

    Load the data into a NumPy array

    consumer_data = np.genfromtxt( 'Consumer_Data.csv', delimiter=',', skip_header=1, )

    Cell 2
    ```python
    # Display the first 5 rows of the array
    print(consumer_data[:5])
    

    Task 3.2: Handling Missing Data

    It's common to encounter missing values (NaNs) in datasets. Before performing arithmetic operations, it's important to handle these missing values. Replace all NaN values with 0 using boolean indexing. Display the first 5 rows of the resulting array.

    πŸ” Hint

    You can use the numpy.isnan function to identify NaNs and then replace them with a value of your choice, such as the mean or median of the column. For example:

    mask = np.isnan(array)
    
    πŸ”‘ Solution
    mask = np.isnan(consumer_data)
    consumer_data[mask] = 0
    
    print(consumer_data[:5])
    

    Task 3.3: Performing Basic Arithmetic Operations

    Now that we have our data loaded into a NumPy array and cleaned, we can start performing arithmetic operations on it.

    Let's start with some basic operations like addition and subtraction. For this task, add 5 to every element in the age column (2nd column) and subtract 2 from every element in the customer_id column (1st column). Save the results back to the original array (in place). Display the first 5 rows of the modified array.

    πŸ” Hint

    To modify a slice in place, first index into the array to get the desired slice. For simple operations, for example, addition and subtraction, += and -= assignment operators will modify the slice in place.

    πŸ”‘ Solution
    consumer_data[:, 1] += 5
    consumer_data[:, 0] -= 2
    	
    print(consumer_data[:5])
    

    Task 3.4: Performing Complex Arithmetic Operations

    In addition to basic operations, NumPy also allows us to perform more complex operations like calculating the mean and standard deviation. For this task, calculate the mean and standard deviation of the age column (2nd column). Print the results.

    πŸ” Hint

    You can use the numpy.mean() and numpy.std() functions to calculate mean and standard deviation. First index into the desired column, then apply the complex operation on the slice.

    πŸ”‘ Solution
    mean_age = np.mean(consumer_data[:, 1])
    std_dev_age = np.std(consumer_data[:, 1])
    	
    print("mean:", mean_age)
    print("std:", std_dev_age)
    
  4. Challenge

    Searching and Sorting in NumPy Arrays

    Searching and Sorting in NumPy Arrays

    To review the concepts covered in this step, please refer to the Searching in NumPy Arrays module of the Operations on Arrays with NumPy course.

    Searching and sorting in NumPy arrays is important because it allows us to organize and find specific elements in the data. This is a key operation in data analysis and manipulation.

    In this final step, we will practice searching and sorting our Consumer_Data.csv NumPy array. We will explore how to use the np.sort and np.where functions to sort and search for elements in the array. In an effort to provide tailored financial services, a bank is undertaking a project to suggest credit cards to their most reliable customers. This decision is based on their credit scores, a key indicator of financial reliability. Your goal is to identify these customers.


    Task 4.1: Load the Data into a NumPy Array

    Our first task is to load the customer data into a NumPy array. This process will transform the raw data from Consumer_Data.csv into a structured format that's easy to manipulate with Python. Once loaded with the provided code, preview the first rows of the data to understand its structure and content.

    πŸ” Hint

    To display a value in Jupyter either:

    1. Write the value on the last line of code in the cell. For example:
      	# Other code above
      	value_to_display
      
    2. Use the print() function with the value as the argument.
    πŸ”‘ Solution

    Cell 1

    # Provided Code
    import numpy as np
    
    # Load the data into a NumPy array
    consumer_data = np.genfromtxt(
        'Consumer_Data.csv', 
        delimiter=',', 
        skip_header=1,
    )
    

    Cell 2

    # Display the first 5 rows of the array
    print(consumer_data[:5])
    

    Task 4.2: Sorting the Array

    With the data loaded, our next task is to sort it based on the Credit_Score column (4th column). This step is crucial as it will allow us to easily identify customers with high credit scores. Display the sorted credit scores.

    πŸ” Hint

    Use the np.sort() function to sort the array. The sliced array (consumer_data[:, 3]) should be the argument to this function. Remember that Python uses 0-based indexing.

    πŸ”‘ Solution
    # Sorting the data by the Credit_Score column
    sorted_data = np.sort(consumer_data[:, 3])  # 4th column for Credit_Score
    print(sorted_data)
    

    Task 4.3: Searching the Array

    Now that we have sorted our data, lets search for bank customers with a credit score above 700. Identifying these customers will help us target the right individuals for our credit card offers. Once you find which customers have a credit score above 700, print out these customers' entire row of data. There should only be 10 customers with a credit score over 700.

    πŸ” Hint

    Use the np.where() function to search for elements in the array. The condition should be passed as an argument to this function. Remember that Python uses 0-based indexing. Use the result to index the original data to be printed.

    πŸ”‘ Solution
    # Searching for customers with a credit score above 700
    high_credit_customers = np.where(consumer_data[:, 3] > 700)
    print(consumer_data[high_credit_customers])
    

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.