- Lab
- Data

Querying and Converting Data Types in R Hands-on Practice
This lab teaches querying and converting data types in R, starting with the basics of dataset structures and moving through data manipulation techniques. It provides practical examples for effective data querying and filtering. The course concludes with essential resources for further learning in data analysis.

Path Info
Table of Contents
-
Challenge
Exploring and Managing Data with RStudio
RStudio Guide
To get started, click on the 'workspace' folder in the bottom right pane of RStudio. Click on the file entitled "Step 1...". You may want to drag the console pane to be smaller so that you have more room to work. You'll complete each task for Step 1 in that R Markdown file. Remember, you must run the cells with the play button at the top right of each cell for a task before moving onto the next task in the R Markdown file. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.
Exploring and Managing Data with RStudio
To review the concepts covered in this step, please refer to the Understanding Dataset Structures and Formats module of the Querying and Converting Data Types in R course.
Understanding Dataset Structures and Formats is important because it lays the foundation for all data analysis tasks in R. This step focuses on practical skills like exploring datasets, using the RStudio interface effectively, and managing data types and packages, which are crucial for any aspiring R programmer.
Start by loading the dataset (
mtcars
) available in R. Use thesummary()
function to get an overview of the dataset. Explore the dataset by attaching it and using basic R commands to query and filter data. Convert a data frame to a tibble and a data table, observing the differences in output and performance. This exercise will help you become familiar with RStudio's interface and the basic data manipulation tasks in R.
Task 1.1: Loading and Summarizing the Dataset
Start by loading the
mtcars
dataset, which is built into R. Use thesummary()
function to get a quick overview of the dataset. This will help you understand the structure and the type of data it contains.π Hint
You can access the
mtcars
dataset directly since it's built into R. Use thesummary()
function by passing the dataset name as an argument.π Solution
# Load the mtcars dataset mtcars # Use the summary function to get an overview of the dataset summary(mtcars)
Task 1.2: Exploring the Dataset
Attach the
mtcars
dataset. Find the cars with an mpg (miles per gallon) greater than 20.π Hint
Use the
attach()
function to make themtcars
dataset's columns directly accessible. Then, use the dataset columnmpg
directly in a conditional statement to filter the data.π Solution
# Attach the mtcars dataset attach(mtcars) # Find cars with mpg greater than 20 mtcars[mpg > 20, ]
Task 1.3: Converting Data Frame to Tibble
Convert the
mtcars
data frame to a tibble namedmtcars_tibble
. Printmtcars_tibble
and observe the differences in output. Tibbles are a modern take on data frames, but with some added conveniences.π Hint
Use the
as_tibble()
function from thetibble
package to convertmtcars
into a tibble. Make sure to load thetibble
package usinglibrary(tibble)
before converting.π Solution
# Load the tibble package library(tibble) # Convert mtcars to a tibble mtcars_tibble <- as_tibble(mtcars) # Observe the output print(mtcars_tibble)
Task 1.4: Converting Data Frame to Data Table
Now, convert the
mtcars
data frame to a data table namedmtcars_data_table
and observe how the output differs from a regular data frame and a tibble.π Hint
Use the
as.data.table()
function from thedata.table
package to convertmtcars
into a data table. Remember to load thedata.table
package usinglibrary(data.table)
before converting.π Solution
# Load the data.table package library(data.table) # Convert mtcars to a data table mtcars_data_table <- as.data.table(mtcars) # Observe the output print(mtcars_data_table)
-
Challenge
Data Type Conversion and List Management
Data Type Conversion and List Management
To review the concepts covered in this step, please refer to the Selecting and Converting Data Types module of the Querying and Converting Data Types in R course.
Selecting and converting data types is important because the ability to manipulate and convert data types is essential for data preprocessing and analysis in R. This step will focus on converting between numeric, integer, character, and factor variables, as well as managing lists, which are common tasks in data science projects.
This exercise will enhance your understanding of R's data types and how to manipulate them.
Task 2.1: Create a Mixed Data Vector
Create a vector named
mixed_data
containing a mix of numeric, integer, character, and logical values. Include the numeric value 42.2, the integer 3, the character stringR is fun
, and the booleanTRUE
.π Hint
Use the
c()
function to combine values of different types. Remember to use quotes for character values andTRUE
orFALSE
for logical values.π Solution
mixed_data <- c(42.2, 3L, 'R is fun', TRUE)
Task 2.2: Convert Mixed Data to Numeric
Convert the
mixed_data
vector to numeric type and assign it to a new variablenumeric_data
. Print the new variable. Note that non-numeric values will be coerced toNA
.π Hint
Use the
as.numeric()
function and passmixed_data
as the argument.π Solution
numeric_data <- as.numeric(mixed_data) print(numeric_data)
Task 2.3: Convert Mixed Data to Integer
Convert the
mixed_data
vector to integer type and assign it to a new variableinteger_data
. Print the new variable. Non-numeric values will be coerced toNA
, while float values will lose information after the decimal.π Hint
Use the
as.integer()
function and passmixed_data
as the argument.π Solution
integer_data <- as.integer(mixed_data) print(integer_data)
Task 2.4: Convert Mixed Data to Character
Convert the
mixed_data
vector to character type and assign it to a new variablecharacter_data
. Print the new variable.π Hint
Use the
as.character()
function and passmixed_data
as the argument.π Solution
character_data <- as.character(mixed_data) print(character_data)
Task 2.5: Factor to Character Conversion
Create a factor variable named
factor_var
with values 'high', 'low', 'high', 'medium', and levels 'low', 'medium', and 'high'. Printfactor_var
. Then, convert this factor to a character variable namedchar_var
and print it.π Hint
To create a factor variable, use the
factor()
function. The first argument should be the values, and the second argument should specify the levels. Use theas.character()
function and passfactor_var
as the argument to convert it to a character vector.π Solution
factor_var <- factor(c('high', 'low', 'high', 'medium'), levels=c('low', 'medium', 'high')) print(factor_var) char_var <- as.character(factor_var) print(char_var)
Task 2.6: Logical to Numeric Conversion
Create a logical vector named
logical_vec
with valuesTRUE
,FALSE
, andTRUE
. Convert this logical vector to a numeric vector namednumeric_vec
and print it.π Hint
Use the
as.numeric()
function and passlogical_vec
as the argument.π Solution
logical_vec <- c(TRUE, FALSE, TRUE) numeric_vec <- as.numeric(logical_vec) print(numeric_vec)
Task 2.7: Convert Character to POSIXct and POSIXlt
Given the provided character vector
char_dates
with dates in the format 'YYYY-MM-DD HH:MM:SS', convert to toPOSIXct
andPOSIXlt
formats, assigning them toposixct_dates
andposixlt_dates
respectively.π Hint
Use the
as.POSIXct()
function for converting toPOSIXct
andas.POSIXlt()
forPOSIXlt
. The format string should be'%Y-%m-%d %H:%M:%S'
.π Solution
char_dates <- c('2023-04-01 12:00:00', '2023-04-02 15:30:00') posixct_dates <- as.POSIXct(char_dates, format = '%Y-%m-%d %H:%M:%S') posixlt_dates <- as.POSIXlt(char_dates, format = '%Y-%m-%d %H:%M:%S')
Task 2.8: Create and Explore a List
Create a list named
mixed_list
containing elements of different data types. Each element should be named. Include the numeric value 42.2, the integer 3, the characterR is fun
, the booleanTRUE
, and the vectorposixct_dates
from the prior task. Then, explore the structure of this list using thestr()
function.π Hint
Use the
list()
function to combine elements of different types. To explore the list, use thestr()
function withmixed_list
as the argument.π Solution
mixed_list <- list(numeric_value=42.2, integer_value=3L, character_value='R is fun', boolean_value=TRUE, vector_value=posixct_dates) str(mixed_list)
-
Challenge
Advanced Data Querying and Filtering
Advanced Data Querying and Filtering
To review the concepts covered in this step, please refer to the Querying and Filtering Data module of the Querying and Converting Data Types in R course.
Querying and Filtering Data is important because these are fundamental skills for extracting insights from data. This step will cover advanced querying and filtering techniques using data frames, data tables, and tibbles, which are essential for any data analysis project in R.
Practice querying a data frame using box brackets and logical tests. Explore the use of the
subset()
function to filter data frames based on specific criteria. Move on to querying and filtering a data table, using advanced techniques like the%in%
operator. Experiment with thedplyr
package to perform queries on a tibble, using functions likefilter()
,arrange()
,select()
, andmutate()
. This step will help you master the art of data querying and filtering in R, using a variety of data structures.
Task 3.1: Querying a Data Frame with Logical Tests
Load the
mtcars
dataset, one of the default datasets that comes with R. Use logical tests within box brackets[]
to query the data frame. Extract rows where thempg
column is greater than 25.π Hint
Use the `data()` function to load `mtcars`. Use the syntax data_frame_name[condition, ] to perform the query. The condition should be a logical test applied to one of the columns, like data_frame$mpg > 25.π Solution
data(mtcars) mtcars[mtcars$mpg > 25, ]
Task 3.2: Filtering Data Frames with the subset() Function
Utilize the subset() function to filter rows from a data frame based on a specific condition. Filter out rows where the
hp
(horsepower) is less than 100.π Hint
The subset() function syntax is subset(x, subset), where x is the data frame and subset is the condition. For example, subset(df, hp < 100).π Solution
subset(mtcars, hp < 100)
Task 3.3: Querying a Data Table with Advanced Operators
Load the
data.table
package. Convertmtcars
to adata.table
namedmtcars_dt
. Query the data table using the %in% operator andorder()
function. Extract rows where thecyl
(number of cylinders) is either 4 or 6, and order them bywt
(weight) in descending order.π Hint
To use the %in% operator, apply it within the i argument of the data table syntax DT[i, j, by]. For ordering, use the order() function within the j argument, and set it to `-column_name` for descending order.π Solution
library('data.table') mtcars_dt <- as.data.table(mtcars) mtcars_dt[cyl %in% c(4, 6), .SD, .SDcols = c('cyl', 'wt')][order(-wt)]
Task 3.4: Performing Queries on a Tibble with dplyr
Use the
dplyr
package to perform complex queries on a tibble. Convert themtcars
data frame to a tibble calledmtcars_tbl
. First, filter rows where mpg is greater than 25. Second, arrange them by carb (number of carburetors) in ascending order. Third, select only the carb and mpg columns. Finally, add a new column mpg_plus_one that is mpg + 1.π Hint
Chain the functions using the %>% operator. Start with `filter()` to apply the mpg condition, then use `arrange()` to sort by carb, followed by `select()` to pick columns, and finally `mutate()` to add the new column.π Solution
library('dplyr') mtcars_tbl <- as_tibble(mtcars) mtcars_tbl %>% filter(mpg > 25) %>% arrange(carb) %>% select(carb, mpg) %>% mutate(mpg_plus_one = mpg + 1)
-
Challenge
Leveraging the Tidyverse for Data Preprocessing
Leveraging the Tidyverse for Data Preprocessing
To review the concepts covered in this step, please refer to the Course Summary and Further Resources module of the Querying and Converting Data Types in R course.
Course Summary and Further Resources is important because it consolidates the learning and introduces powerful tools for data preprocessing. This step emphasizes the use of the Tidyverse suite of packages for efficient data manipulation, which is a critical skill in data science.
Load the
tidyverse
package, which comes pre-installed in this environment. Practice usingreadr
to import data andtidyr
for data cleaning tasks. Explore the use ofpurrr
for applying functions across elements in a list or vector. Usedplyr
to perform data manipulation tasks such as selecting, renaming, summarizing, and mutating data. This exercise will familiarize you with the Tidyverse ecosystem, enhancing your data preprocessing capabilities in R.
Task 4.1: Loading the tidyverse Package
Begin by loading the
tidyverse
package to access its suite of tools for data manipulation. This is the first step in utilizing the powerful features of thetidyverse
for data preprocessing.π Hint
Use the
library()
function and specify the name of the package you want to load, which istidyverse
.π Solution
library('tidyverse')
Task 4.2: Importing Data with readr
Use the
readr
package, part of the Tidyverse, to import the CSV file namedsample_data.csv
into R. Store the imported data in a variable nameddata
for further manipulation.π Hint
You don't need to load
readr
explicitly, as it is already loaded bytidyverse
. Assign the result ofread_csv()
function to a variable. The function takes the file name as its argument, which in this case issample_data.csv
.π Solution
data <- read_csv('sample_data.csv')
Task 4.3: Cleaning Data with tidyr
With the
tidyr
package, part of the Tidyverse, practice cleaning the imported data. Specifically, use thedrop_na()
function to remove any rows with missing values from thedata
dataframe.π Hint
Use the
drop_na()
function on thedata
variable to remove rows with NA values. Assign the result back to a variable.π Solution
data <- drop_na(data)
Task 4.4: Applying Functions with purrr
Utilize the
purrr
package, part of the Tidyverse, to apply a function that doubles the values of thenumbers
column. Store the result in a new variable nameddoubled_numbers
and print that variable.π Hint
Use the
map_dbl()
function frompurrr
. The first argument is the vectornumbers
, and the second argument is a formula that specifies the function to apply, in this case, doubling the values.π Solution
doubled_numbers <- map_dbl(data$numbers, ~ .x * 2) print(doubled_numbers)
Task 4.5: Data Manipulation with dplyr
Using the
dplyr
package, part of the Tidyverse, perform a series of data manipulation tasks on thedata
dataframe. First, select only the columnsid
andvalue
. Then, rename the columnvalue
tomeasurement
. Finally, add a new columnmeasurement_sq
that contains the square ofmeasurement
. Print the new version ofdata
.π Hint
Chain the operations using the
%>%
operator. Useselect()
to choose columns,rename()
to change column names, andmutate()
to add new columns. Replace placeholders with the correct column names and operations.π Solution
data <- data %>% select(id, value) %>% rename(measurement = value) %>% mutate(measurement_sq = measurement^2) print(data)
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the authorβs guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.