Hamburger Icon
  • Labs icon Lab
  • Data
Labs

Pandas Arrays and Data Structures Hands-on Practice

In this lab, you'll master Pandas arrays and data structures, including creating various data types, handling time data, and manipulating categorical and sparse data. You'll learn essential skills like sorting, filtering, and efficient data structuring, culminating in string data manipulation techniques, equipping you with comprehensive Pandas proficiency.

Labs

Path Info

Level
Clock icon Beginner
Duration
Clock icon 1h 1m
Published
Clock icon Dec 07, 2023

Contact sales

By filling out this form and clicking submit, you acknowledge ourΒ privacy policy.

Table of Contents

  1. Challenge

    Exploring Pandas Arrays and Data Types

    Jupyter Guide

    To get started, open the file on the right entitled "Step 1...". You'll complete each task for Step 1 in that Jupyter Notebook file. Remember, you must run the cells (ctrl/cmd(⌘) + Enter) for each task before moving onto the next task in the Jupyter Notebook. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.


    Exploring Pandas Arrays and Data Types

    To review the concepts covered in this step, please refer to the Pandas Arrays and Data Types module of the Pandas Arrays and Data Structures course.

    Understanding Pandas Arrays and their data types is important because they form the core of data manipulation and analysis in Pandas. The ability to create and manipulate arrays, and understand their characteristics, is key to efficient data work.

    In this step, you will dive deep into the world of Pandas Arrays and their data types. Your goal is to create Pandas arrays with various data types, check their types, and compare the results. You will use the dtype attribute and parameter to create arrays as specific types and understand how Pandas infers the best data type when dtype is not specified.


    Task 1.1: Creating a Pandas Array with Integer Data Type

    Create a Pandas array with integer data type using the dtype parameter. After creating the array, print it out to display its contents and data type.

    πŸ” Hint

    Use the pd.array() function to create a Pandas array. Pass a list of integers as the first argument and 'int' as the dtype parameter. Use the print() function to display the array and its data type using array.dtype.

    πŸ”‘ Solution
    import pandas as pd
    
    # Create a pandas array with integer data type
    array_int = pd.array([1, 2, 3, 4, 5], dtype='int')
    
    # Print the array and its data type
    print("Array:", array_int)
    print("Data Type:", array_int.dtype)
    

    Task 1.2: Creating a Pandas Array with String Data Type

    Create a Pandas array with string data type using the dtype parameter. After creating the array, print it out to display its contents and data type.

    πŸ” Hint

    Use the pd.array() function to create a Pandas array. Pass a list of strings as the first argument and 'str' as the dtype parameter. Use the print() function to display the array and its data type using array.dtype.

    πŸ”‘ Solution
    # Create a pandas array with string data type
    array_str = pd.array(['a', 'b', 'c', 'd', 'e'], dtype='str')
    
    # Print the array and its data type
    print("Array:", array_str)
    print("Data Type:", array_str.dtype)
    

    Task 1.3: Creating a Pandas Array with Mixed Data Type

    Create a Pandas array with mixed data type and let Pandas infer the best data type. After creating the array, print it out to display its contents and data type.

    πŸ” Hint

    Use the pd.array() function to create a Pandas array. Pass a list with mixed data types (integers and strings) as the first argument. Use the print() function to display the array and its data type using array.dtype.

    πŸ”‘ Solution
    # Create a pandas array with mixed data type
    array_mixed = pd.array([1, 'b', 3, 'd', 5])
    
    # Print the array and its data type
    print("Array:", array_mixed)
    print("Data Type:", array_mixed.dtype)
    

    Task 1.4: Creating a Mixed Pandas Array with Explicit Object dtype

    Create a Pandas array with mixed data type (integers and strings) and explicitly set the data type to 'object'. After creating the array, print it out to display its contents and data type.

    πŸ” Hint

    Use the pd.array() function to create a Pandas array. Pass a list with mixed data types (integers and strings) as the first argument and 'object' as the dtype parameter. Use the print() function to display the array and its data type using array.dtype.

    πŸ”‘ Solution
    # Create a pandas array with mixed data type and dtype='object'
    array_mixed_object = pd.array([1, 'b', 3, 'd', 5], dtype='object')
    
    # Print the array and its data type
    print("Array:", array_mixed_object)
    print("Data Type:", array_mixed_object.dtype)
    
  2. Challenge

    Working with Time in Pandas

    Working with Time in Pandas

    To review the concepts covered in this step, please refer to the Pandas Arrays and Data Types module of the Pandas Arrays and Data Structures course.

    Understanding how to handle time-related data in Pandas is important because time is a common dimension in many datasets. You will often need to manipulate and compare timestamps, timedeltas, and intervals.

    In this step, you will learn how to handle date and time operations in Pandas. You will create Timestamp objects, convert them to different time zones. You will also learn how to create and manipulate Time Delta and Interval objects from Timestamps, checking if values are inside an interval, if two intervals overlap, and shifting and extending intervals.


    Task 2.1: Creating Timestamps

    Create a Timestamp object for the current date and time using the now() method. Save the result as the variable now, and then print it out to see the result.

    πŸ” Hint

    Use the pd.Timestamp.now() function without any arguments to get the current date and time, and then use the print() function to display the result.

    πŸ”‘ Solution
    import pandas as pd
    
    # Create a Timestamp object for the current date and time
    now = pd.Timestamp.now()
    
    print("Current Timestamp:", now)
    

    Task 2.2: Converting Timezones

    Convert the now Timestamp object to the 'Asia/Kolkata' timezone, and then print the converted Timestamp.

    πŸ” Hint

    Use the tz_localize method of the Timestamp object and pass the timezone string 'Asia/Kolkata' as an argument. Then use the print() function to display the converted Timestamp.

    πŸ”‘ Solution
    # Convert the Timestamp object to the 'Asia/Kolkata' timezone
    now_kolkata = now.tz_localize('Asia/Kolkata')
    
    print("Timestamp in Asia/Kolkata timezone:", now_kolkata)
    

    Task 2.3: Creating Time Delta

    Create a Time Delta object representing a duration of 1 day. Save the value in a variable named delta, and then print it out.

    πŸ” Hint

    Use the pd.Timedelta function and pass the string '1 day' as an argument. Then use the print() function to display the Time Delta object.

    πŸ”‘ Solution
    # Create a Time Delta object representing a duration of 1 day
    delta = pd.Timedelta('1 day')
    
    print("Time Delta (1 day):", delta)
    

    Task 2.4: Manipulating Timestamps with Time Delta

    Add the Time Delta object to the Timestamp object to get a new Timestamp object representing the next day. Save the value to tomorrow, and then print it.

    πŸ” Hint

    Use the + operator to add the Time Delta object to the Timestamp object. Then use the print() function to display the new Timestamp.

    πŸ”‘ Solution
    # Add the Time Delta object to the Timestamp object
    tomorrow = now + delta
    
    print("Timestamp for the next day:", tomorrow)
    

    Task 2.5: Creating Interval

    Create an Interval object representing the interval between the original Timestamp and the new Timestamp, and then print it. You can explore the attributes of the interval, like .length or .mid for the midpoint.

    πŸ” Hint

    Use the pd.Interval function and pass the original Timestamp and the new Timestamp as arguments. Then use the print() function to display the Interval.

    πŸ”‘ Solution
    # Create an Interval object
    interval = pd.Interval(now, tomorrow)
    
    print("Interval:", interval)
    print("Length:", interval.length)
    print("Midpoint:", interval.mid)
    

    Task 2.6: Checking if a Value is Inside an Interval

    Check if the current date and time is inside the Interval object, and then print the result. Since the interval excludes the left endpoint and includes the right endpoint, now should not be in the interval.

    πŸ” Hint

    Use the in keyword to check if now is in the interval. Then use the print() function to display the result.

    πŸ”‘ Solution
    # Check if the current date and time is inside the Interval object
    print("Is current time inside the interval?", now in interval)
    

    Task 2.7: Checking if Two Intervals Overlap

    Create another Interval object for the next two days (now to now + 2 * delta) and check if it overlaps with the original Interval object, then print the result.

    πŸ” Hint

    Use the pd.Interval function to create another Interval object for the next day. Then use the overlaps method of the original Interval object and pass the new Interval object as an argument. Finally, use the print() function to display the result.

    πŸ”‘ Solution
    # Create another Interval object
    next_day = pd.Interval(now, now + 2 * delta)
    
    # Check if it overlaps with the original Interval object
    overlap = interval.overlaps(next_day)
    
    print("Do the intervals overlap?", overlap)
    

    Task 2.8: Shifting Intervals

    Shift the original Interval object by the Time Delta object, then print the resulting Interval.

    πŸ” Hint

    Use + to add the Time Delta object, delta to the interval. Finally, use the print() function to display the result.

    πŸ”‘ Solution
    # Shift the original Interval object
    shifted = interval + delta
    
    print("Shifted Interval:", shifted)
    
  3. Challenge

    Manipulating Categorical and Sparse Data

    Manipulating Categorical and Sparse Data

    To review the concepts covered in this step, please refer to the Manipulating Data with Pandas module of the Pandas Arrays and Data Structures course.

    Understanding how to manipulate categorical and sparse data is important because these types of data are common in real-world datasets. Categorical data can be used for filtering and sorting operations, while sparse data can make your code more efficient when there are lots of empty values.

    In this step, you will learn how to manipulate categorical and sparse data in Pandas. You will create an ordered Categorical data series from a pandas Series, and use it for sorting and filtering operations. You will also learn how to work with sparse data, creating a sparse array, converting an existing DataFrame to use a sparse array, and understanding the structure of a sparse array.


    Task 3.1: Creating an Ordered Categorical Data Series

    Create an ordered Categorical data series from a pandas Series. The series contains the following data:

    ['low', 'high', 'medium', 'high', 'low', 'medium', 'medium']
    

    The categories should be ordered as follows: ['low', 'medium', 'high']. After creating the series, print it to see the ordered categories.

    πŸ” Hint

    Use the pd.Categorical function to convert the series to a Categorical data series. Pass in the series and the categories you want in ascending order. Set the ordered parameter to True. Then use print() to display the series.

    πŸ”‘ Solution
    import pandas as pd
    
    # Provided pandas Series
    series = pd.Series(['low', 'high', 'medium', 'high', 'low', 'medium', 'medium'])
    
    # Convert the series to an ordered Categorical data series
    ordered_series = pd.Categorical(series, categories=['low', 'medium', 'high'], ordered=True)
    
    # Print the ordered series
    print("Ordered Categorical Series:", ordered_series)
    

    Task 3.2: Sorting and Filtering Operations

    Perform sorting and filtering operations on the ordered Categorical data series. First, sort the series in ascending order, and then print it. Next, filter the series to only include 'medium' and 'high' categories, and then print the filtered series.

    πŸ” Hint

    Use the sort_values method to sort the series. Use boolean indexing to filter the series. Use print() to display the results of each operation.

    πŸ”‘ Solution
    # Sort the series in ascending order
    sorted_series = ordered_series.sort_values()
    print("Sorted Series:", sorted_series)
    
    # Filter the series to only include 'medium' and 'high' categories
    filtered_series = sorted_series[(sorted_series == 'medium') | (sorted_series == 'high')]
    print("Filtered Series:", filtered_series)
    

    Task 3.3: Working with Sparse Data

    Create a sparse array with the pd.arrays.SparseArray function. The array should contain the following data: [0, 0, 0, 3, 0, 0, 5, 0, 0, 0]. After creating the sparse array, print it to understand its structure.

    πŸ” Hint

    Use the pd.arrays.SparseArray function to create a sparse array. Pass in the data as a list. Then use print() to display the sparse array.

    πŸ”‘ Solution
    # Create a sparse array
    sparse_array = pd.arrays.SparseArray([0, 0, 0, 3, 0, 0, 5, 0, 0, 0])
    
    # Print the sparse array
    print("Sparse Array:", sparse_array)
    

    Task 3.4: Converting a DataFrame to Use a Sparse Array

    Create a DataFrame with the provided data:

    {
        'A': [0, 0, 0, 3, 0, 0, 5, 0, 0, 0], 
        'B': [1, 0, 0, 0, 2, 0, 0, 0, 3, 0],
    }
    

    Then, convert the DataFrame to use a sparse array, and print the resulting DataFrame and the DataFrame.dtypes attribute for its column dtypes. It should look like a normal DataFrame, but the columns should be type Sparse[float64, nan].

    πŸ” Hint

    Use the df.astype method to convert the DataFrame to use a sparse array. Pass in pd.SparseDtype() as the argument. Then use print() to display the sparse DataFrame.

    πŸ”‘ Solution
    import pandas as pd
    
    # Create a DataFrame
    df = pd.DataFrame({'A': [0, 0, 0, 3, 0, 0, 5, 0, 0, 0], 'B': [1, 0, 0, 0, 2, 0, 0, 0, 3, 0]})
    
    # Convert the DataFrame to use a sparse array
    sparse_df = df.astype(pd.SparseDtype())
    
    # Print the sparse DataFrame and its column structure
    print("Sparse DataFrame:\n", sparse_df)
    print("\nSparse DataFrame Structure:\n", sparse_df.dtypes)
    
  4. Challenge

    Working with String Data and Finalizing Course Project

    Working with String Data and Finalizing Course Project

    To review the concepts covered in this step, please refer to the Manipulating Data with Pandas module of the Pandas Arrays and Data Structures course.

    Understanding how to manipulate string data is important because strings are a common data type in many datasets. You will often need to replace, split, strip, and concatenate strings.

    In this final step, you will learn how to manipulate string data in Pandas. You will use the replace, split, strip, and cat methods to manipulate strings, and the expand and na_rep parameters to convert a list to a DataFrame and replace NaN values, respectively.


    Task 4.1: Replacing String Data

    Replace all occurrences of 'apple' with 'pineapple' in the provided pandas Series. Display the resulting series.

    πŸ” Hint

    Use the Series.str.replace() method of the pandas Series. The first argument should be the string you want to replace ('apple'), and the second argument should be the string you want to replace it with ('pineapple').

    πŸ”‘ Solution
    import pandas as pd
    
    # Provided pandas Series
    series = pd.Series(['apple', 'banana-pie', 'cherry', 'apple-crumble', 'cherry', 'apple-candy'])
    
    # Replace all occurrences of 'apple' with 'pineapple'
    series = series.str.replace('apple', 'pineapple')
    print("Pineapple instead of Apple:", series)
    

    Task 4.2: Splitting String Data

    Split all strings in the pandas Series where there is a '-' character. Split those entries into lists of multiple strings. Display the results.

    πŸ” Hint

    Use the Series.str.split method of the pandas Series. The argument should be the character you want to split the string on ('-').

    πŸ”‘ Solution
    series = series.str.split('-')
    print("Split Strings:", series)
    

    Task 4.3: Stripping String Data

    Remove leading and trailing whitespace from a the provided whitespace_series. Display the result.

    πŸ” Hint

    Use the Series.str.strip method of the pandas Series. This method does not take any arguments.

    πŸ”‘ Solution
    # Provided pandas Series
    whitespace_series = pd.Series([' apple ', ' banana ', ' cherry '])
    
    whitespace_series = whitespace_series.str.strip()
    print("Whitespace-stripped series:", whitespace_series)
    

    Task 4.4: Concatenating String Data

    Concatenate each fruit in the provided series with ' pie'. Display the results.

    πŸ” Hint

    Use the Series.str.cat method of the pandas Series. The argument should be a list of strings you want to concatenate with each string in the Series. In this case, [' pie']*3. Display the results.

    πŸ”‘ Solution
    # Provided pandas Series
    fruit_series = pd.Series(['apple', 'banana', 'cherry'])
    
    fruit_series = fruit_series.str.cat([' pie']*3)
    print("Pie options:", fruit_series)
    

    Task 4.5: Expanding a List to a DataFrame

    After splitting strings, you are left with a list of strings. Expanding lists into their own columns in a dataframe may be useful when there is an expected number of splits in a string.

    Split the provided flavor_desert series on the '-' character and expand the result into a dataframe. Display the resulting dataframe.

    πŸ” Hint

    Use the Series.str.split method of the pandas Series with the expand parameter set to True.

    πŸ”‘ Solution
    # Provided pandas Series
    flavor_desert = pd.Series(['apple-pie', 'kiwi', 'banana-split', 'cherry-candy', "pineapple-chunks", "dragonfruit"])
    
    df = flavor_desert.str.split('-', expand=True)
    print("Dataframe of Split Series")
    df
    

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.