• Labs icon Lab
  • Data
Labs

Query and Filter Data in R

Learn by doing in this practical, beginner-friendly lab! Find out how you can use filter(), select(), arrange(), and chaining to extract, sort, and streamline data like a professional data analyst. By completing the lab, learners will gain hands-on experience in table manipulation in R by using the dplyr package.

Labs

Path Info

Level
Clock icon Beginner
Duration
Clock icon 30m
Published
Clock icon Apr 24, 2025

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Introduction

    Query and Filter Data in R

    This Code Lab introduces querying and filtering data in R using the dplyr package. In this lab, you will be analyzing employee absence by using dplyr in R. You will learn how to select rows and columns, sort data, as well as perform complex queries by chaining commands.

    The dataset provided for this lab is a single CSV file called Employee Absence.csv. The Employee Absence dataset contains columns such as employee_number, employee_name, gender, city, job_title, department, store_location, business_unit, division, age, length_of_service, and hours_absent.

    First, ensure you load the dplyr package into your R session. You'll also load the Employee Absence dataset and inspect its structure to understand the available columns.

    Task 1.1: Load dplyr and Inspect the Dataset

    In the RStudio Console, run the following code:

    # Load dplyr
    library(dplyr)
    
    # Load the Employee Absence dataset
    data <- read.csv("Employee Absence.csv")
    
    # Inspect the dataset
    glimpse(data)
    
    Observed Completion You should see the following results in the **Console**:
    Rows: 50
    Columns: 12
    $ employee_number   <int> 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 11…
    $ employee_name     <chr> "John Smith", "Sarah Johnson", "Michael Davis", "Emily Wilson", "Robert Anderson",…
    $ gender            <chr> "Male", "Female", "Male", "Female", "Male", "Female", "Male", "Female", "Male", "F…
    $ city              <chr> "New York", "Los Angeles", "Chicago", "San Francisco", "Houston", "Seattle", "Miam…
    $ job_title         <chr> "Accountant", "Marketing Specialist", "Sales Representative", "Software Engineer",…
    $ department        <chr> "Finance", "Marketing", "Sales", "Technology", "Human Resources", "Operations", "C…
    $ store_location    <chr> "Store A", "Store B", "Store C", "Store D", "Store E", "Store F", "Store G", "Stor…
    $ business_unit     <chr> "Finance Division", "Marketing Division", "Sales Division", "Technology Division",…
    $ division          <chr> "Accounting", "Marketing", "Sales", "Engineering", "Human Resources", "Operations"…
    $ age               <int> 32, 28, 35, 29, 37, 31, 26, 33, 30, 27, 34, 29, 36, 30, 38, 32, 27, 34, 31, 28, 33…
    $ length_of_service <int> 5, 3, 7, 4, 8, 6, 2, 5, 4, 3, 6, 4, 7, 5, 9, 7, 3, 6, 5, 4, 6, 4, 8, 5, 9, 7, 3, 6…
    $ hours_absent      <int> 8, 16, 4, 0, 24, 8, 0, 0, 16, 0, 8, 16, 0, 0, 32, 16, 0, 0, 24, 0, 8, 16, 0, 0, 40…
    

    The glimpse() function provides a concise overview of the dataset, showing column names, data types, and a preview of the values.

    Now that you have the dplyr package and Employee Absence dataset loaded, you can proceed to selecting rows.

  2. Challenge

    Use `filter()` to Select Rows Based on Conditions

    Use filter() with a Simple Condition

    The filter() function from dplyr allows you to select rows that meet specific conditions. For example, you can select employees who work in the Finance department as follows:

    # Filter employees in the Finance department
    finance_employees <- filter(data, department == "Finance")
    
    # View the result
    print(finance_employees)
    

    This code returns a new dataframe containing only the rows where the department column equals "Finance".


    Task 2.1: Use filter() with One Simple Condition

    In the RStudio Console, write a filter() command to select employees who are in the Sales department.

    Solution
    filter(data, department == "Sales")
    
    Observed Completion You should see the following results in the **Console**:
      employee_number employee_name gender    city
    1             103 Michael Davis   Male Chicago
    2             113   Aiden Brown   Male Chicago
    3             123   Liam Turner   Male Chicago
    4             133  Lucas Turner   Male Chicago
    5             143   Liam Turner   Male Chicago
                 job_title department store_location
    1 Sales Representative      Sales        Store C
    2 Sales Representative      Sales        Store C
    3 Sales Representative      Sales        Store C
    4 Sales Representative      Sales        Store C
    5 Sales Representative      Sales        Store C
       business_unit division age length_of_service
    1 Sales Division    Sales  35                 7
    2 Sales Division    Sales  36                 7
    3 Sales Division    Sales  37                 8
    4 Sales Division    Sales  38                 8
    5 Sales Division    Sales  39                 8
      hours_absent
    1            4
    2            0
    3            0
    4            0
    5            0
    

    Operators and Functions for Filtering

    The filter() function supports various operators and functions to define conditions:

    • Comparison operators: == (equal), != (not equal), > (greater than), < (less than), >= (greater than or equal), <= (less than or equal)
    • Logical operators: & (and), | (or), ! (not)
    • Functions:
      • is.na() to check for missing values
      • between(x, left, right) to check if values fall within a range
      • grepl(pattern, x) to match text patterns
      • %in% to check if values are in a specified set

    For example, you could filter employees with more than 5 years of service using length_of_service > 5 or check for employees in specific cities using city %in% c("New York", "Chicago").

    Example Using a Function to Filter

    You can use functions within filter() to create more complex conditions. For instance, you can filter employees located in cities that contain the word "York" using grepl().

    # Filter employees in cities containing "York"
    york_employees <- filter(data, grepl("York", city))
    
    # View the result
    print(york_employees)
    

    This code returns employees whose city column contains "York" (e.g., New York).


    Task 2.2: Use a Function to Filter

    In the RStudio Console, write a filter() command to select employees who are either in Seattle or Boston.

    Solution
    filter(data, city %in% c("Seattle", "Boston"))
    
    Observed Completion You should see the following results in the **Console**:
       employee_number   employee_name gender    city
    1              106 Olivia Thompson Female Seattle
    2              108      Sophia Lee Female  Boston
    3              116     Sofia Green Female Seattle
    4              118    Ava Gonzalez Female  Boston
    5              126     Sophia Bell Female Seattle
    6              128    Ava Anderson Female  Boston
    7              136      Ava Wilson Female Seattle
    8              138     Emily Moore Female  Boston
    9              146   Sophia Walker Female Seattle
    10             148      Emily Hill Female  Boston
                   job_title         department
    1  Operations Supervisor         Operations
    2        Product Manager Product Management
    3  Operations Supervisor         Operations
    4        Product Manager Product Management
    5  Operations Supervisor         Operations
    6        Product Manager Product Management
    7  Operations Supervisor         Operations
    8        Product Manager Product Management
    9  Operations Supervisor         Operations
    10       Product Manager Product Management
       store_location       business_unit
    1         Store F Operations Division
    2         Store H    Product Division
    3         Store F Operations Division
    4         Store H    Product Division
    5         Store F Operations Division
    6         Store H    Product Division
    7         Store F Operations Division
    8         Store H    Product Division
    9         Store F Operations Division
    10        Store H    Product Division
                 division age length_of_service
    1          Operations  31                 6
    2  Product Management  33                 5
    3          Operations  32                 7
    4  Product Management  34                 6
    5          Operations  33                 7
    6  Product Management  35                 6
    7          Operations  34                 7
    8  Product Management  36                 6
    9          Operations  35                 7
    10 Product Management  37                 6
       hours_absent
    1             8
    2             0
    3            16
    4             0
    5            16
    6             0
    7            16
    8             0
    9            16
    10            0
    

    Using Multiple Conditions

    You can combine multiple conditions in a single filter() call using logical operators like & (and) or | (or). Conditions are evaluated for each row, and only rows where all conditions are true (for &) or at least one condition is true (for |) are returned.

    Example with Multiple Conditions

    When using filter(), you're not limited to using one condition. For example, you can filter employees who are in the Human Resources department and have more than 30 hours absent.

    # Filter HR employees with more than 30 hours absent
    hr_high_absence <- filter(data, department == "Human Resources" & hours_absent > 30)
    
    # View the result
    print(hr_high_absence)
    

    This code returns employees who satisfy both conditions: they work in Human Resources and have been absent for more than 30 hours.


    Task 2.3: Use Multiple Filter Conditions

    In the RStudio Console, modify the multiple conditions example to filter employees in the Operations department whose length of service is at least 5.

    Solution
    filter(data, department == "Operations" & length_of_service >= 5)
    
    Observed Completion You should see the following results in the **Console**:
      employee_number   employee_name gender    city
    1             106 Olivia Thompson Female Seattle
    2             116     Sofia Green Female Seattle
    3             126     Sophia Bell Female Seattle
    4             136      Ava Wilson Female Seattle
    5             146   Sophia Walker Female Seattle
                  job_title department store_location
    1 Operations Supervisor Operations        Store F
    2 Operations Supervisor Operations        Store F
    3 Operations Supervisor Operations        Store F
    4 Operations Supervisor Operations        Store F
    5 Operations Supervisor Operations        Store F
            business_unit   division age
    1 Operations Division Operations  31
    2 Operations Division Operations  32
    3 Operations Division Operations  33
    4 Operations Division Operations  34
    5 Operations Division Operations  35
      length_of_service hours_absent
    1                 6            8
    2                 7           16
    3                 7           16
    4                 7           16
    5                 7           16
    

    In the next step, you'll learn how to use select() to subset columns by name or position.

  3. Challenge

    Apply `select()` to Subset Columns by Name or Position

    Apply select() to Subset Columns by Name or Position

    In this step, you'll learn how to use the select() function from the dplyr package to subset columns from the Employee Absence dataset by name or position. This allows you to focus on specific columns relevant to your analysis.

    Select One Column from a Dataset

    The select() function allows you to choose specific columns from your dataset. To select a single column, such as employee_name, use the column name directly.

    # Select the employee_name column
    names_only <- select(data, employee_name)
    
    # View the result
    print(names_only)
    

    This code returns a new dataframe containing only the employee_name column.


    Task 3.1: Select a Single Column

    In the RStudio Console, write a select() command to choose the job_title column from the dataset.

    Solution
    select(data, job_title)
    
    Observed Completion You should see the following results in the **Console**:
                             job_title
    1                       Accountant
    2             Marketing Specialist
    3             Sales Representative
    4                Software Engineer
    5          Human Resources Manager
    ...
    50        Administrative Assistant
    

    Select Multiple Columns

    You can select multiple columns by listing their names in the select() function. This is useful when you need to analyze a subset of columns together.

    Examples of Selecting Multiple Columns

    Here are different ways to select multiple columns:

    1. By name: Select employee_name and department explicitly.
    2. By position: Use numeric indices to select adjacent columns by their position.
    3. By range: Select a range of columns using a colon (:).
    # Select by name
    name_dept <- select(data, employee_name, department)
    
    # Select by position (columns 2 and 5)
    name_job <- select(data, 2, 5)
    
    # Select a range of adjacent columns (from employee_name to department)
    name_to_dept <- select(data, employee_name:department)
    
    # View one of the results
    print(name_dept)
    

    These commands create dataframes with the specified columns. For example, name_dept includes only employee_name and department.


    Task 3.2: Select Multiple Columns by Name

    In the RStudio Console, write a select() command to choose the employee_name, city, and hours_absent columns.

    Solution
    select(data, employee_name, city, hours_absent)
    
    Observed Completion You should see the following results in the **Console**:
            employee_name          city hours_absent
    1          John Smith      New York            8
    2       Sarah Johnson   Los Angeles           16
    3       Michael Davis       Chicago            4
    4        Emily Wilson San Francisco            0
    5     Robert Anderson       Houston           24
    ...
    50       Ava Gonzalez       Atlanta            0
    

    Selection Helpers

    The select() function supports helper functions to make column selection easier, especially for large datasets. Common helpers include:

    • starts_with("prefix"): Select columns starting with a specific prefix.
    • ends_with("suffix"): Select columns ending with a specific suffix.
    • contains("text"): Select columns containing specific text.
    • matches("pattern"): Select columns matching a regular expression.
    • everything(): Select all columns (useful for reordering).

    Example Using Selection Helpers

    Suppose you want to select columns related to employee details, such as those starting with "employ" or containing "hour".

    # Select columns starting with "employ" and containing "hour"
    employee_hours <- select(data, starts_with("employ"), contains("hour"))
    
    # View the result
    print(employee_hours)
    

    This code returns a dataframe with employee_number, employee_name, and hours_absent.


    Task 3.3: Use a Selection Helper

    In the RStudio Console, write a select() command to choose all columns that don't contain underscore (_) in the column name using a selection helper.

    Hint The logical NOT operator in R is `!`.
    Solution
    select(data, !contains("_"))
    
    Observed Completion You should see the following results in the **Console**:
       gender          city         department           division age
    1    Male      New York            Finance         Accounting  32
    2  Female   Los Angeles          Marketing          Marketing  28
    3    Male       Chicago              Sales              Sales  35
    4  Female San Francisco         Technology        Engineering  29
    5    Male       Houston    Human Resources    Human Resources  37
    ...
    50 Female       Atlanta     Administration     Administration  31
    

    Rename Columns if Needed

    You can rename columns while selecting them using the select() function by specifying new names with the syntax new_name = old_name. This is helpful for making column names more descriptive or consistent.

    # Select and rename employee_name to name and hours_absent to absence_hours
    renamed_cols <- select(data, name = employee_name, absence_hours = hours_absent)
    
    # View the result
    print(renamed_cols)
    

    This code creates a dataframe with two columns, name and absence_hours, instead of the original names.


    Task 3.4: Select and Rename Columns

    In the RStudio Console, write a select() command to choose the job_title and department columns, renaming them to role and dept, respectively.

    Solution
    select(data, role = job_title, dept = department)
    
    Observed Completion You should see the following results in the **Console**:
                                  role               dept
    1                       Accountant            Finance
    2             Marketing Specialist          Marketing
    3             Sales Representative              Sales
    4                Software Engineer         Technology
    5          Human Resources Manager    Human Resources
    ...
    50        Administrative Assistant     Administration
    

    In the next step, you'll learn how to use arrange() to sort data by one or more columns.

  4. Challenge

    Use `arrange()` to Sort Data by One or More Columns

    Use arrange() to Sort Data by One or More Columns

    In this step, you'll learn how to use the arrange() function from the dplyr package to sort the Employee Absence dataset by one or more columns. Sorting helps you organize data for better analysis or presentation.

    Sort by One Column

    The arrange() function sorts rows based on the values in a specified column. By default, it sorts in ascending order. For example, you can sort employees by their hours_absent.

    # Sort by hours_absent in ascending order
    sorted_by_absence <- arrange(data, hours_absent)
    
    # View the result
    print(sorted_by_absence)
    

    This code returns a dataframe with rows sorted by hours_absent from lowest to highest.


    Task 4.1: Sort by a Single Column

    In the RStudio Console, write an arrange() command to sort the dataset by age in ascending order.

    Solution
    arrange(data, age)
    
    Observed Completion You should see the following results in the **Console**: ``` employee_number employee_name gender city job_title department 1 107 David Martinez Male Miami Customer Service Representative Customer Service 2 110 Emma Taylor Female Atlanta Administrative Assistant Administration 3 117 Noah Baker Male Miami Customer Service Representative Customer Service 4 102 Sarah Johnson Female Los Angeles Marketing Specialist Marketing 5 120 Charlotte Hill Female Atlanta Administrative Assistant Administration ... 50 145 Noah Richardson Male Houston Human Resources Manager Human Resources store_location business_unit division age length_of_service hours_absent 1 Store G Customer Service Division Customer Service 26 2 0 2 Store J Admin Division Administration 27 3 0 3 Store G Customer Service Division Customer Service 27 3 0 4 Store B Marketing Division Marketing 28 3 16 5 Store J Admin Division Administration 28 4 0 ... 50 Store E HR Division Human Resources 41 9 56 ```

    Sort in Descending Order

    To sort in descending order, use the desc() function within arrange(). This is useful when you want to see the highest values first, such as employees with the most absences.

    # Sort by hours_absent in descending order
    sorted_by_absence_desc <- arrange(data, desc(hours_absent))
    
    # View the result
    print(sorted_by_absence_desc)
    

    This code sorts the dataframe by hours_absent from highest to lowest.


    Task 4.2: Sort in Descending Order

    In the RStudio Console, write an arrange() command to sort the dataset by length_of_service in descending order.

    Solution
    arrange(data, desc(length_of_service))
    
    Observed Completion You should see the following results in the **Console**: ``` employee_number employee_name gender city job_title department 1 115 Liam Lewis Male Houston Human Resources Manager Human Resources 2 125 Noah Butler Male Houston Human Resources Manager Human Resources 3 135 Noah Hernandez Male Houston Human Resources Manager Human Resources 4 145 Noah Richardson Male Houston Human Resources Manager Human Resources 5 105 Robert Anderson Male Houston Human Resources Manager Human Resources ... 50 107 David Martinez Male Miami Customer Service Representative Customer Service store_location business_unit division age length_of_service hours_absent 1 Store E HR Division Human Resources 38 9 32 2 Store E HR Division Human Resources 39 9 40 3 Store E HR Division Human Resources 40 9 48 4 Store E HR Division Human Resources 41 9 56 5 Store E HR Division Human Resources 37 8 24 ... 50 Store G Customer Service Division Customer Service 26 2 0 ```

    Sort by Multiple Columns

    You can sort by multiple columns by listing them in arrange(). The dataframe is sorted by the first column, then by the second column within each value of the first, and so on. This is helpful for breaking ties or organizing data hierarchically.

    # Sort by department (ascending) and then hours_absent (descending)
    sorted_by_dept_absence <- arrange(data, department, desc(hours_absent))
    
    # View the result
    print(sorted_by_dept_absence)
    

    This code sorts employees first by department (alphabetically) and then, within each department, by hours_absent from highest to lowest.


    Task 4.3: Sort by Multiple Columns

    In the RStudio Console, write an arrange() command to sort the dataset by city in ascending order and then by age in ascending order.

    Solution
    arrange(data, city, age)
    
    Observed Completion You should see the following results in the **Console**: ``` employee_number employee_name gender city job_title department 1 110 Emma Taylor Female Atlanta Administrative Assistant Administration 2 120 Charlotte Hill Female Atlanta Administrative Assistant Administration 3 130 Olivia Richardson Female Atlanta Administrative Assistant Administration 4 140 Sophia Anderson Female Atlanta Administrative Assistant Administration 5 150 Ava Gonzalez Female Atlanta Administrative Assistant Administration ... 50 146 Sophia Walker Female Seattle Operations Supervisor Operations store_location business_unit division age length_of_service hours_absent 1 Store J Admin Division Administration 27 3 0 2 Store J Admin Division Administration 28 4 0 3 Store J Admin Division Administration 29 4 0 4 Store J Admin Division Administration 30 4 0 5 Store J Admin Division Administration 31 4 0 ... 50 Store F Operations Division Operations 35 7 16 ```

    Handle Missing Values in Sorting

    When sorting, missing values (NA) are placed at the end of the sorted dataframe, whether sorting in ascending or descending order. The Employee Absence dataset has no missing values, but it’s good to know how arrange() handles them. If you need to control the placement of NA values, you can use additional functions like is.na() in combination with arrange(), though this is less common.

    # Example: Sort by hours_absent, ensuring NA values are handled (none in this dataset)
    sorted_with_na <- arrange(data, !is.na(hours_absent), hours_absent)
    
    # View the result
    print(sorted_with_na)
    

    This code ensures rows with NA in hours_absent (if any) appear last when sorting in ascending order.


    Task 4.4: Sort with Consideration for Missing Values

    In the RStudio Console, write an arrange() command to sort the dataset by employee_name in ascending order, ensuring any NA values (if present) appear last.

    Solution
    arrange(data, !is.na(employee_name), employee_name)
    
    Observed Completion You should see the following results in the **Console**: ``` employee_number employee_name gender city job_title department 1 113 Aiden Brown Male Chicago Sales Representative Sales 2 142 Aiden Harris Male Los Angeles Marketing Specialist Marketing 3 128 Ava Anderson Female Boston Product Manager Product Management 4 118 Ava Gonzalez Female Boston Product Manager Product Management 5 150 Ava Gonzalez Female Atlanta Administrative Assistant Administration ... 50 139 William Phillips Male Dallas IT Specialist IT store_location business_unit division age length_of_service hours_absent 1 Store C Sales Division Sales 36 7 0 2 Store B Marketing Division Marketing 32 5 16 3 Store H Product Division Product Management 35 6 0 4 Store H Product Division Product Management 34 6 0 5 Store J Admin Division Administration 31 4 0 ... 50 Store I IT Division IT 33 5 24 ```

    In the next step, you'll learn how to chain commands using dplyr to perform complex queries.

  5. Challenge

    Chain Commands to Perform Complex Queries with `dplyr`

    Chain Commands to Perform Complex Queries with dplyr

    In this step, you'll learn how to chain multiple dplyr commands using the pipe operator (%>%) to perform complex queries on the Employee Absence dataset. Chaining allows you to combine filter(), select(), arrange(), and other operations in a single, readable workflow.

    Use the Pipe Operator to Chain Commands

    The pipe operator (%>%) passes the output of one dplyr function as the input to the next, enabling you to build a sequence of operations. For example, you can filter employees, select specific columns, and sort the results in one command.

    # Chain filter, select, and arrange
    result <- data %>%
      filter(department == "Finance") %>%
      select(employee_name, hours_absent) %>%
      arrange(hours_absent)
    
    # View the result
    print(result)
    

    This code filters employees in the Finance department, selects only the employee_name and hours_absent columns, and sorts by hours_absent in ascending order.


    Task 5.1: Chain Basic Commands

    In the RStudio Console, write a chained command to filter employees in the Marketing department, select the employee_name and city columns, and sort by city in ascending order.

    Solution
    data %>%
      filter(department == "Marketing") %>%
      select(employee_name, city) %>%
      arrange(city)
    
    Observed Completion You should see the following results in the **Console**: ``` employee_name city 1 Sarah Johnson Los Angeles 2 Isabella Moore Los Angeles 3 Emma Adams Los Angeles 4 Sophia Rodriguez Los Angeles 5 Aiden Harris Los Angeles ```

    Combine Multiple Filters in a Chain

    You can include multiple conditions in a filter() step within a chain to refine your query. This is powerful for narrowing down data to meet specific criteria before selecting or sorting.

    # Chain with multiple filter conditions
    result <- data %>%
      filter(department == "Human Resources", hours_absent > 30) %>%
      select(employee_name, hours_absent, length_of_service) %>%
      arrange(desc(hours_absent))
    
    # View the result
    print(result)
    

    This code filters for Human Resources employees with more than 30 hours absent, selects relevant columns, and sorts by hours_absent in descending order.


    Task 5.2: Chain with Multiple Filter Conditions

    In the RStudio Console, write a chained command to filter employees in the Operations department with at least 7 years of service, select the employee_name, city, and hours_absent columns, and sort by hours_absent in descending order.

    Solution
    data %>%
      filter(department == "Operations", length_of_service >= 7) %>%
      select(employee_name, city, hours_absent) %>%
      arrange(desc(hours_absent))
    
    Observed Completion You should see the following results in the **Console**: ``` employee_name city hours_absent 1 Sofia Green Seattle 16 2 Sophia Bell Seattle 16 3 Ava Wilson Seattle 16 4 Sophia Walker Seattle 16 ```

    Use Selection Helpers in a Chain

    You can incorporate selection helpers like starts_with() or contains() within a chain to dynamically select columns. This is useful for complex datasets with many columns.

    # Chain with selection helper
    result <- data %>%
      filter(city %in% c("New York", "Chicago")) %>%
      select(employee_name, starts_with("hours")) %>%
      arrange(hours_absent)
    
    # View the result
    print(result)
    

    This code filters employees in New York or Chicago, selects the employee_name and any columns starting with "hours" (i.e., hours_absent), and sorts by hours_absent in ascending order.


    Task 5.3: Chain with a Selection Helper

    In the RStudio Console, write a chained command to filter employees in the Technology department, select the employee_name and columns containing "age" using a selection helper, and sort by age in ascending order.

    Solution
    data %>%
      filter(department == "Technology") %>%
      select(employee_name, contains("age")) %>%
      arrange(age)
    
    Observed Completion You should see the following results in the **Console**: ``` employee_name age 1 Emily Wilson 29 2 Mia Clark 30 3 Olivia Harris 31 4 Emma Davis 32 5 Olivia Adams 33 ```

    Rename Columns in a Chain

    You can rename columns using select() or the rename() function within a chain to make the output more readable. The rename() function is particularly useful when you want to keep all columns but change specific names.

    # Chain with renaming
    result <- data %>%
      filter(department == "Sales") %>%
      select(employee_name, hours_absent, length_of_service) %>%
      rename(name = employee_name, absence_hours = hours_absent) %>%
      arrange(desc(absence_hours))
    
    # View the result
    print(result)
    

    This code filters Sales employees, selects three columns, renames employee_name to name and hours_absent to absence_hours, and sorts by absence_hours in descending order.


    Task 5.4: Chain with Renaming

    In the RStudio Console, write a chained command to filter employees in the Product Management department, select the employee_name and hours_absent columns, rename hours_absent to absence, and sort by absence in ascending order.

    Solution
    data %>%
      filter(department == "Product Management") %>%
      select(employee_name, hours_absent) %>%
      rename(absence = hours_absent) %>%
      arrange(absence)
    
    Observed Completion You should see the following results in the **Console**: ``` employee_name absence 1 Sophia Lee 0 2 Ava Gonzalez 0 3 Ava Anderson 0 4 Emily Moore 0 5 Emily Hill 0 ```

    You've now completed this Code Lab on querying and filtering data in R using dplyr! You can combine filter(), select(), arrange(), and the pipe operator (%>%) to create powerful and readable data queries.

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.