Libraries: If you want this lab, consider one of these libraries.
Data

Query and Filter Data in R

Learn by doing in this practical, beginner-friendly lab! Find out how you can use filter(), select(), arrange(), and chaining to extract, sort, and streamline data like a professional data analyst. By completing the lab, learners will gain hands-on experience in table manipulation in R by using the dplyr package.

Get started Contact sales

Lab Info

Level

Beginner

Last updated

Apr 24, 2025

Duration

30m

Challenge

Introduction

Query and Filter Data in R

This Code Lab introduces querying and filtering data in R using the dplyr package. In this lab, you will be analyzing employee absence by using dplyr in R. You will learn how to select rows and columns, sort data, as well as perform complex queries by chaining commands.

The dataset provided for this lab is a single CSV file called Employee Absence.csv. The Employee Absence dataset contains columns such as employee_number, employee_name, gender, city, job_title, department, store_location, business_unit, division, age, length_of_service, and hours_absent.

First, ensure you load the dplyr package into your R session. You'll also load the Employee Absence dataset and inspect its structure to understand the available columns.

Task 1.1: Load `dplyr` and Inspect the Dataset

In the RStudio Console, run the following code:

# Load dplyr
library(dplyr)

# Load the Employee Absence dataset
data <- read.csv("Employee Absence.csv")

# Inspect the dataset
glimpse(data)

Observed Completion

You should see the following results in the **Console**:

Rows: 50
Columns: 12
$ employee_number   <int> 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 11…
$ employee_name     <chr> "John Smith", "Sarah Johnson", "Michael Davis", "Emily Wilson", "Robert Anderson",…
$ gender            <chr> "Male", "Female", "Male", "Female", "Male", "Female", "Male", "Female", "Male", "F…
$ city              <chr> "New York", "Los Angeles", "Chicago", "San Francisco", "Houston", "Seattle", "Miam…
$ job_title         <chr> "Accountant", "Marketing Specialist", "Sales Representative", "Software Engineer",…
$ department        <chr> "Finance", "Marketing", "Sales", "Technology", "Human Resources", "Operations", "C…
$ store_location    <chr> "Store A", "Store B", "Store C", "Store D", "Store E", "Store F", "Store G", "Stor…
$ business_unit     <chr> "Finance Division", "Marketing Division", "Sales Division", "Technology Division",…
$ division          <chr> "Accounting", "Marketing", "Sales", "Engineering", "Human Resources", "Operations"…
$ age               <int> 32, 28, 35, 29, 37, 31, 26, 33, 30, 27, 34, 29, 36, 30, 38, 32, 27, 34, 31, 28, 33…
$ length_of_service <int> 5, 3, 7, 4, 8, 6, 2, 5, 4, 3, 6, 4, 7, 5, 9, 7, 3, 6, 5, 4, 6, 4, 8, 5, 9, 7, 3, 6…
$ hours_absent      <int> 8, 16, 4, 0, 24, 8, 0, 0, 16, 0, 8, 16, 0, 0, 32, 16, 0, 0, 24, 0, 8, 16, 0, 0, 40…

The glimpse() function provides a concise overview of the dataset, showing column names, data types, and a preview of the values.

Now that you have the dplyr package and Employee Absence dataset loaded, you can proceed to selecting rows.

Challenge

Use `filter()` to Select Rows Based on Conditions

Use `filter()` with a Simple Condition

The filter() function from dplyr allows you to select rows that meet specific conditions. For example, you can select employees who work in the Finance department as follows:

# Filter employees in the Finance department
finance_employees <- filter(data, department == "Finance")

# View the result
print(finance_employees)

This code returns a new dataframe containing only the rows where the department column equals "Finance".

Task 2.1: Use `filter()` with One Simple Condition

In the RStudio Console, write a filter() command to select employees who are in the Sales department.

Solution

filter(data, department == "Sales")

Observed Completion

You should see the following results in the **Console**:

  employee_number employee_name gender    city
1             103 Michael Davis   Male Chicago
2             113   Aiden Brown   Male Chicago
3             123   Liam Turner   Male Chicago
4             133  Lucas Turner   Male Chicago
5             143   Liam Turner   Male Chicago
             job_title department store_location
1 Sales Representative      Sales        Store C
2 Sales Representative      Sales        Store C
3 Sales Representative      Sales        Store C
4 Sales Representative      Sales        Store C
5 Sales Representative      Sales        Store C
   business_unit division age length_of_service
1 Sales Division    Sales  35                 7
2 Sales Division    Sales  36                 7
3 Sales Division    Sales  37                 8
4 Sales Division    Sales  38                 8
5 Sales Division    Sales  39                 8
  hours_absent
1            4
2            0
3            0
4            0
5            0

Operators and Functions for Filtering

The filter() function supports various operators and functions to define conditions:

Comparison operators: == (equal), != (not equal), > (greater than), < (less than), >= (greater than or equal), <= (less than or equal)
Logical operators: & (and), | (or), ! (not)
Functions:
- is.na() to check for missing values
- between(x, left, right) to check if values fall within a range
- grepl(pattern, x) to match text patterns
- %in% to check if values are in a specified set

For example, you could filter employees with more than 5 years of service using length_of_service > 5 or check for employees in specific cities using city %in% c("New York", "Chicago").

Example Using a Function to Filter

You can use functions within filter() to create more complex conditions. For instance, you can filter employees located in cities that contain the word "York" using grepl().

# Filter employees in cities containing "York"
york_employees <- filter(data, grepl("York", city))

# View the result
print(york_employees)

This code returns employees whose city column contains "York" (e.g., New York).

Task 2.2: Use a Function to Filter

In the RStudio Console, write a filter() command to select employees who are either in Seattle or Boston.

Solution

filter(data, city %in% c("Seattle", "Boston"))

Observed Completion

You should see the following results in the **Console**:

   employee_number   employee_name gender    city
1              106 Olivia Thompson Female Seattle
2              108      Sophia Lee Female  Boston
3              116     Sofia Green Female Seattle
4              118    Ava Gonzalez Female  Boston
5              126     Sophia Bell Female Seattle
6              128    Ava Anderson Female  Boston
7              136      Ava Wilson Female Seattle
8              138     Emily Moore Female  Boston
9              146   Sophia Walker Female Seattle
10             148      Emily Hill Female  Boston
               job_title         department
1  Operations Supervisor         Operations
2        Product Manager Product Management
3  Operations Supervisor         Operations
4        Product Manager Product Management
5  Operations Supervisor         Operations
6        Product Manager Product Management
7  Operations Supervisor         Operations
8        Product Manager Product Management
9  Operations Supervisor         Operations
10       Product Manager Product Management
   store_location       business_unit
1         Store F Operations Division
2         Store H    Product Division
3         Store F Operations Division
4         Store H    Product Division
5         Store F Operations Division
6         Store H    Product Division
7         Store F Operations Division
8         Store H    Product Division
9         Store F Operations Division
10        Store H    Product Division
             division age length_of_service
1          Operations  31                 6
2  Product Management  33                 5
3          Operations  32                 7
4  Product Management  34                 6
5          Operations  33                 7
6  Product Management  35                 6
7          Operations  34                 7
8  Product Management  36                 6
9          Operations  35                 7
10 Product Management  37                 6
   hours_absent
1             8
2             0
3            16
4             0
5            16
6             0
7            16
8             0
9            16
10            0

Using Multiple Conditions

You can combine multiple conditions in a single filter() call using logical operators like & (and) or | (or). Conditions are evaluated for each row, and only rows where all conditions are true (for &) or at least one condition is true (for |) are returned.

Example with Multiple Conditions

When using filter(), you're not limited to using one condition. For example, you can filter employees who are in the Human Resources department and have more than 30 hours absent.

# Filter HR employees with more than 30 hours absent
hr_high_absence <- filter(data, department == "Human Resources" & hours_absent > 30)

# View the result
print(hr_high_absence)

This code returns employees who satisfy both conditions: they work in Human Resources and have been absent for more than 30 hours.

Task 2.3: Use Multiple Filter Conditions

In the RStudio Console, modify the multiple conditions example to filter employees in the Operations department whose length of service is at least 5.

Solution

filter(data, department == "Operations" & length_of_service >= 5)

Observed Completion

You should see the following results in the **Console**:

  employee_number   employee_name gender    city
1             106 Olivia Thompson Female Seattle
2             116     Sofia Green Female Seattle
3             126     Sophia Bell Female Seattle
4             136      Ava Wilson Female Seattle
5             146   Sophia Walker Female Seattle
              job_title department store_location
1 Operations Supervisor Operations        Store F
2 Operations Supervisor Operations        Store F
3 Operations Supervisor Operations        Store F
4 Operations Supervisor Operations        Store F
5 Operations Supervisor Operations        Store F
        business_unit   division age
1 Operations Division Operations  31
2 Operations Division Operations  32
3 Operations Division Operations  33
4 Operations Division Operations  34
5 Operations Division Operations  35
  length_of_service hours_absent
1                 6            8
2                 7           16
3                 7           16
4                 7           16
5                 7           16

In the next step, you'll learn how to use select() to subset columns by name or position.

Challenge

Apply `select()` to Subset Columns by Name or Position
Apply select() to Subset Columns by Name or Position

In this step, you'll learn how to use the select() function from the dplyr package to subset columns from the Employee Absence dataset by name or position. This allows you to focus on specific columns relevant to your analysis.

Select One Column from a Dataset

The select() function allows you to choose specific columns from your dataset. To select a single column, such as employee_name, use the column name directly.
```
# Select the employee_name column
names_only <- select(data, employee_name)

# View the result
print(names_only)
```
This code returns a new dataframe containing only the employee_name column.

Task 3.1: Select a Single Column

In the RStudio Console, write a select() command to choose the job_title column from the dataset.
Solution

select(data, job_title)
Observed Completion
You should see the following results in the **Console**:
job_title 1 Accountant 2 Marketing Specialist 3 Sales Representative 4 Software Engineer 5 Human Resources Manager ... 50 Administrative Assistant
Select Multiple Columns

You can select multiple columns by listing their names in the select() function. This is useful when you need to analyze a subset of columns together.

Examples of Selecting Multiple Columns

Here are different ways to select multiple columns:
1. By name: Select employee_name and department explicitly.
2. By position: Use numeric indices to select adjacent columns by their position.
3. By range: Select a range of columns using a colon (:).
```
# Select by name
name_dept <- select(data, employee_name, department)

# Select by position (columns 2 and 5)
name_job <- select(data, 2, 5)

# Select a range of adjacent columns (from employee_name to department)
name_to_dept <- select(data, employee_name:department)

# View one of the results
print(name_dept)
```
These commands create dataframes with the specified columns. For example, name_dept includes only employee_name and department.

Task 3.2: Select Multiple Columns by Name

In the RStudio Console, write a select() command to choose the employee_name, city, and hours_absent columns.
Solution

select(data, employee_name, city, hours_absent)
Observed Completion
You should see the following results in the **Console**:
employee_name city hours_absent 1 John Smith New York 8 2 Sarah Johnson Los Angeles 16 3 Michael Davis Chicago 4 4 Emily Wilson San Francisco 0 5 Robert Anderson Houston 24 ... 50 Ava Gonzalez Atlanta 0
Selection Helpers

The select() function supports helper functions to make column selection easier, especially for large datasets. Common helpers include:
- starts_with("prefix"): Select columns starting with a specific prefix.
- ends_with("suffix"): Select columns ending with a specific suffix.
- contains("text"): Select columns containing specific text.
- matches("pattern"): Select columns matching a regular expression.
- everything(): Select all columns (useful for reordering).
Example Using Selection Helpers

Suppose you want to select columns related to employee details, such as those starting with "employ" or containing "hour".
```
# Select columns starting with "employ" and containing "hour"
employee_hours <- select(data, starts_with("employ"), contains("hour"))

# View the result
print(employee_hours)
```
This code returns a dataframe with employee_number, employee_name, and hours_absent.

Task 3.3: Use a Selection Helper

In the RStudio Console, write a select() command to choose all columns that don't contain underscore (_) in the column name using a selection helper.

Hint
The logical NOT operator in R is `!`.
Solution

select(data, !contains("_"))
Observed Completion
You should see the following results in the **Console**:
gender city department division age 1 Male New York Finance Accounting 32 2 Female Los Angeles Marketing Marketing 28 3 Male Chicago Sales Sales 35 4 Female San Francisco Technology Engineering 29 5 Male Houston Human Resources Human Resources 37 ... 50 Female Atlanta Administration Administration 31
Rename Columns if Needed

You can rename columns while selecting them using the select() function by specifying new names with the syntax new_name = old_name. This is helpful for making column names more descriptive or consistent.
```
# Select and rename employee_name to name and hours_absent to absence_hours
renamed_cols <- select(data, name = employee_name, absence_hours = hours_absent)

# View the result
print(renamed_cols)
```
This code creates a dataframe with two columns, name and absence_hours, instead of the original names.

Task 3.4: Select and Rename Columns

In the RStudio Console, write a select() command to choose the job_title and department columns, renaming them to role and dept, respectively.
Solution

select(data, role = job_title, dept = department)
Observed Completion
You should see the following results in the **Console**:
role dept 1 Accountant Finance 2 Marketing Specialist Marketing 3 Sales Representative Sales 4 Software Engineer Technology 5 Human Resources Manager Human Resources ... 50 Administrative Assistant Administration
In the next step, you'll learn how to use arrange() to sort data by one or more columns.
Challenge

Use `arrange()` to Sort Data by One or More Columns
Use arrange() to Sort Data by One or More Columns

In this step, you'll learn how to use the arrange() function from the dplyr package to sort the Employee Absence dataset by one or more columns. Sorting helps you organize data for better analysis or presentation.

Sort by One Column

The arrange() function sorts rows based on the values in a specified column. By default, it sorts in ascending order. For example, you can sort employees by their hours_absent.
```
# Sort by hours_absent in ascending order
sorted_by_absence <- arrange(data, hours_absent)

# View the result
print(sorted_by_absence)
```
This code returns a dataframe with rows sorted by hours_absent from lowest to highest.

Task 4.1: Sort by a Single Column

In the RStudio Console, write an arrange() command to sort the dataset by age in ascending order.
Solution

arrange(data, age)
Observed Completion
You should see the following results in the **Console**: ``` employee_number employee_name gender city job_title department 1 107 David Martinez Male Miami Customer Service Representative Customer Service 2 110 Emma Taylor Female Atlanta Administrative Assistant Administration 3 117 Noah Baker Male Miami Customer Service Representative Customer Service 4 102 Sarah Johnson Female Los Angeles Marketing Specialist Marketing 5 120 Charlotte Hill Female Atlanta Administrative Assistant Administration ... 50 145 Noah Richardson Male Houston Human Resources Manager Human Resources store_location business_unit division age length_of_service hours_absent 1 Store G Customer Service Division Customer Service 26 2 0 2 Store J Admin Division Administration 27 3 0 3 Store G Customer Service Division Customer Service 27 3 0 4 Store B Marketing Division Marketing 28 3 16 5 Store J Admin Division Administration 28 4 0 ... 50 Store E HR Division Human Resources 41 9 56 ```

Sort in Descending Order

To sort in descending order, use the desc() function within arrange(). This is useful when you want to see the highest values first, such as employees with the most absences.
```
# Sort by hours_absent in descending order
sorted_by_absence_desc <- arrange(data, desc(hours_absent))

# View the result
print(sorted_by_absence_desc)
```
This code sorts the dataframe by hours_absent from highest to lowest.

Task 4.2: Sort in Descending Order

In the RStudio Console, write an arrange() command to sort the dataset by length_of_service in descending order.
Solution

arrange(data, desc(length_of_service))
Observed Completion
You should see the following results in the **Console**: ``` employee_number employee_name gender city job_title department 1 115 Liam Lewis Male Houston Human Resources Manager Human Resources 2 125 Noah Butler Male Houston Human Resources Manager Human Resources 3 135 Noah Hernandez Male Houston Human Resources Manager Human Resources 4 145 Noah Richardson Male Houston Human Resources Manager Human Resources 5 105 Robert Anderson Male Houston Human Resources Manager Human Resources ... 50 107 David Martinez Male Miami Customer Service Representative Customer Service store_location business_unit division age length_of_service hours_absent 1 Store E HR Division Human Resources 38 9 32 2 Store E HR Division Human Resources 39 9 40 3 Store E HR Division Human Resources 40 9 48 4 Store E HR Division Human Resources 41 9 56 5 Store E HR Division Human Resources 37 8 24 ... 50 Store G Customer Service Division Customer Service 26 2 0 ```

Sort by Multiple Columns

You can sort by multiple columns by listing them in arrange(). The dataframe is sorted by the first column, then by the second column within each value of the first, and so on. This is helpful for breaking ties or organizing data hierarchically.
```
# Sort by department (ascending) and then hours_absent (descending)
sorted_by_dept_absence <- arrange(data, department, desc(hours_absent))

# View the result
print(sorted_by_dept_absence)
```
This code sorts employees first by department (alphabetically) and then, within each department, by hours_absent from highest to lowest.

Task 4.3: Sort by Multiple Columns

In the RStudio Console, write an arrange() command to sort the dataset by city in ascending order and then by age in ascending order.
Solution

arrange(data, city, age)
Observed Completion
You should see the following results in the **Console**: ``` employee_number employee_name gender city job_title department 1 110 Emma Taylor Female Atlanta Administrative Assistant Administration 2 120 Charlotte Hill Female Atlanta Administrative Assistant Administration 3 130 Olivia Richardson Female Atlanta Administrative Assistant Administration 4 140 Sophia Anderson Female Atlanta Administrative Assistant Administration 5 150 Ava Gonzalez Female Atlanta Administrative Assistant Administration ... 50 146 Sophia Walker Female Seattle Operations Supervisor Operations store_location business_unit division age length_of_service hours_absent 1 Store J Admin Division Administration 27 3 0 2 Store J Admin Division Administration 28 4 0 3 Store J Admin Division Administration 29 4 0 4 Store J Admin Division Administration 30 4 0 5 Store J Admin Division Administration 31 4 0 ... 50 Store F Operations Division Operations 35 7 16 ```

Handle Missing Values in Sorting

When sorting, missing values (NA) are placed at the end of the sorted dataframe, whether sorting in ascending or descending order. The Employee Absence dataset has no missing values, but it’s good to know how arrange() handles them. If you need to control the placement of NA values, you can use additional functions like is.na() in combination with arrange(), though this is less common.
```
# Example: Sort by hours_absent, ensuring NA values are handled (none in this dataset)
sorted_with_na <- arrange(data, !is.na(hours_absent), hours_absent)

# View the result
print(sorted_with_na)
```
This code ensures rows with NA in hours_absent (if any) appear last when sorting in ascending order.

Task 4.4: Sort with Consideration for Missing Values

In the RStudio Console, write an arrange() command to sort the dataset by employee_name in ascending order, ensuring any NA values (if present) appear last.
Solution

arrange(data, !is.na(employee_name), employee_name)
Observed Completion
You should see the following results in the **Console**: ``` employee_number employee_name gender city job_title department 1 113 Aiden Brown Male Chicago Sales Representative Sales 2 142 Aiden Harris Male Los Angeles Marketing Specialist Marketing 3 128 Ava Anderson Female Boston Product Manager Product Management 4 118 Ava Gonzalez Female Boston Product Manager Product Management 5 150 Ava Gonzalez Female Atlanta Administrative Assistant Administration ... 50 139 William Phillips Male Dallas IT Specialist IT store_location business_unit division age length_of_service hours_absent 1 Store C Sales Division Sales 36 7 0 2 Store B Marketing Division Marketing 32 5 16 3 Store H Product Division Product Management 35 6 0 4 Store H Product Division Product Management 34 6 0 5 Store J Admin Division Administration 31 4 0 ... 50 Store I IT Division IT 33 5 24 ```

In the next step, you'll learn how to chain commands using dplyr to perform complex queries.
Challenge

Chain Commands to Perform Complex Queries with `dplyr`
Chain Commands to Perform Complex Queries with dplyr

In this step, you'll learn how to chain multiple dplyr commands using the pipe operator (%>%) to perform complex queries on the Employee Absence dataset. Chaining allows you to combine filter(), select(), arrange(), and other operations in a single, readable workflow.

Use the Pipe Operator to Chain Commands

The pipe operator (%>%) passes the output of one dplyr function as the input to the next, enabling you to build a sequence of operations. For example, you can filter employees, select specific columns, and sort the results in one command.
```
# Chain filter, select, and arrange
result <- data %>%
  filter(department == "Finance") %>%
  select(employee_name, hours_absent) %>%
  arrange(hours_absent)

# View the result
print(result)
```
This code filters employees in the Finance department, selects only the employee_name and hours_absent columns, and sorts by hours_absent in ascending order.

Task 5.1: Chain Basic Commands

In the RStudio Console, write a chained command to filter employees in the Marketing department, select the employee_name and city columns, and sort by city in ascending order.
Solution

data %>% filter(department == "Marketing") %>% select(employee_name, city) %>% arrange(city)
Observed Completion
You should see the following results in the **Console**: ``` employee_name city 1 Sarah Johnson Los Angeles 2 Isabella Moore Los Angeles 3 Emma Adams Los Angeles 4 Sophia Rodriguez Los Angeles 5 Aiden Harris Los Angeles ```

Combine Multiple Filters in a Chain

You can include multiple conditions in a filter() step within a chain to refine your query. This is powerful for narrowing down data to meet specific criteria before selecting or sorting.
```
# Chain with multiple filter conditions
result <- data %>%
  filter(department == "Human Resources", hours_absent > 30) %>%
  select(employee_name, hours_absent, length_of_service) %>%
  arrange(desc(hours_absent))

# View the result
print(result)
```
This code filters for Human Resources employees with more than 30 hours absent, selects relevant columns, and sorts by hours_absent in descending order.

Task 5.2: Chain with Multiple Filter Conditions

In the RStudio Console, write a chained command to filter employees in the Operations department with at least 7 years of service, select the employee_name, city, and hours_absent columns, and sort by hours_absent in descending order.
Solution

data %>% filter(department == "Operations", length_of_service >= 7) %>% select(employee_name, city, hours_absent) %>% arrange(desc(hours_absent))
Observed Completion
You should see the following results in the **Console**: ``` employee_name city hours_absent 1 Sofia Green Seattle 16 2 Sophia Bell Seattle 16 3 Ava Wilson Seattle 16 4 Sophia Walker Seattle 16 ```

Use Selection Helpers in a Chain

You can incorporate selection helpers like starts_with() or contains() within a chain to dynamically select columns. This is useful for complex datasets with many columns.
```
# Chain with selection helper
result <- data %>%
  filter(city %in% c("New York", "Chicago")) %>%
  select(employee_name, starts_with("hours")) %>%
  arrange(hours_absent)

# View the result
print(result)
```
This code filters employees in New York or Chicago, selects the employee_name and any columns starting with "hours" (i.e., hours_absent), and sorts by hours_absent in ascending order.

Task 5.3: Chain with a Selection Helper

In the RStudio Console, write a chained command to filter employees in the Technology department, select the employee_name and columns containing "age" using a selection helper, and sort by age in ascending order.
Solution

data %>% filter(department == "Technology") %>% select(employee_name, contains("age")) %>% arrange(age)
Observed Completion
You should see the following results in the **Console**: ``` employee_name age 1 Emily Wilson 29 2 Mia Clark 30 3 Olivia Harris 31 4 Emma Davis 32 5 Olivia Adams 33 ```

Rename Columns in a Chain

You can rename columns using select() or the rename() function within a chain to make the output more readable. The rename() function is particularly useful when you want to keep all columns but change specific names.
```
# Chain with renaming
result <- data %>%
  filter(department == "Sales") %>%
  select(employee_name, hours_absent, length_of_service) %>%
  rename(name = employee_name, absence_hours = hours_absent) %>%
  arrange(desc(absence_hours))

# View the result
print(result)
```
This code filters Sales employees, selects three columns, renames employee_name to name and hours_absent to absence_hours, and sorts by absence_hours in descending order.

Task 5.4: Chain with Renaming

In the RStudio Console, write a chained command to filter employees in the Product Management department, select the employee_name and hours_absent columns, rename hours_absent to absence, and sort by absence in ascending order.
Solution

data %>% filter(department == "Product Management") %>% select(employee_name, hours_absent) %>% rename(absence = hours_absent) %>% arrange(absence)
Observed Completion
You should see the following results in the **Console**: ``` employee_name absence 1 Sophia Lee 0 2 Ava Gonzalez 0 3 Ava Anderson 0 4 Emily Moore 0 5 Emily Hill 0 ```

You've now completed this Code Lab on querying and filtering data in R using dplyr! You can combine filter(), select(), arrange(), and the pipe operator (%>%) to create powerful and readable data queries.

About the author

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Query and Filter Data in R

Lab Info

Table of Contents

Introduction

Query and Filter Data in R

Task 1.1: Load dplyr and Inspect the Dataset

Use `filter()` to Select Rows Based on Conditions

Use filter() with a Simple Condition

Task 2.1: Use filter() with One Simple Condition

Operators and Functions for Filtering

Example Using a Function to Filter

Task 2.2: Use a Function to Filter

Using Multiple Conditions

Example with Multiple Conditions

Task 2.3: Use Multiple Filter Conditions

Apply `select()` to Subset Columns by Name or Position

Apply select() to Subset Columns by Name or Position

Select One Column from a Dataset

Task 3.1: Select a Single Column

Select Multiple Columns

Examples of Selecting Multiple Columns

Task 3.2: Select Multiple Columns by Name

Selection Helpers

Example Using Selection Helpers

Task 3.3: Use a Selection Helper

Rename Columns if Needed

Task 3.4: Select and Rename Columns

Use `arrange()` to Sort Data by One or More Columns

Use arrange() to Sort Data by One or More Columns

Sort by One Column

Task 4.1: Sort by a Single Column

Sort in Descending Order

Task 4.2: Sort in Descending Order

Sort by Multiple Columns

Task 4.3: Sort by Multiple Columns

Handle Missing Values in Sorting

Task 4.4: Sort with Consideration for Missing Values

Chain Commands to Perform Complex Queries with `dplyr`

Chain Commands to Perform Complex Queries with dplyr

Use the Pipe Operator to Chain Commands

Task 5.1: Chain Basic Commands

Combine Multiple Filters in a Chain

Task 5.2: Chain with Multiple Filter Conditions

Use Selection Helpers in a Chain

Task 5.3: Chain with a Selection Helper

Rename Columns in a Chain

Task 5.4: Chain with Renaming

About the author

Real skill practice before real-world application

Learn by doing

Follow your guide

Turn time into mastery

Get started with Pluralsight

Task 1.1: Load `dplyr` and Inspect the Dataset

Use `filter()` with a Simple Condition

Task 2.1: Use `filter()` with One Simple Condition

Apply `select()` to Subset Columns by Name or Position

Use `arrange()` to Sort Data by One or More Columns

Chain Commands to Perform Complex Queries with `dplyr`