- Lab
- Data

Creating and Debugging R Programs Hands-on Practice
In this lab, you'll begin by mastering the basics of R, including script execution and package management. You'll advance to outputting and debugging data, learning techniques for saving results and identifying code errors. By the end, you'll be equipped to automate scripts and tackle common debugging challenges, enhancing your R programming efficiency.

Path Info
Table of Contents
-
Challenge
Exploring the R Toolbox
RStudio Guide
To get started, click on the 'workspace' folder in the bottom right pane of RStudio. Click on the file entitled "Step 1...". You may want to drag the console pane to be smaller so that you have more room to work. You'll complete each task for Step 1 in that R Markdown file. Remember, you must run the cells with the play button at the top right of each cell for a task before moving onto the next task in the R Markdown file. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.
Exploring the R Toolbox
To review the concepts covered in this step, please refer to the Understanding the R Platform module of the Creating and Debugging R Programs course.
Understanding the R toolbox is important because it provides the basic tools and functionalities that we need to effectively use R. This includes running R scripts from the command line, using the R console, setting variables in the R console, and installing packages in RStudio.
Let's dive into the world of R and get our hands dirty by exploring its toolbox. In this step, we will learn how to use the R console, set variables in the R console, and load packages in RStudio. These are fundamental skills that you will use throughout your journey with R.
Task 1.1: Using the R console
Open the R console and perform a simple arithmetic operation. For example, calculate the sum of 5 and 3.
π Hint
The R console is located in the bottom left portion of the screen. To perform an arithmetic operation, you can simply type the operation in the console. For example, to calculate the sum of 5 and 3, you can type
5 + 3
.π Solution
5 + 3
Task 1.2: Setting variables in the R console
In the R console, create a variable named 'x' and assign it the value 10. Then, print the value of 'x'.
π Hint
You can use the assignment operator (
<-
) to assign a value to a variable. For example, to assign the value 10 to a variable named 'x', you can typex <- 10
. To print the value of 'x', you can simply typex
.π Solution
x <- 10 x
Task 1.3: Loading packages in RStudio
Using the R console, load the
ggplot2
package into the environment. This package comes pre-installed in this R environment, so you do not need to install it.π Hint
To load the 'ggplot2' package, you can use the
library()
function with 'ggplot2' as the argument.π Solution
library('ggplot2')
-
Challenge
Working with R Data Types
Working with R Data Types
To review the concepts covered in this step, please refer to the Data Types module of the Creating and Debugging R Programs course.
Understanding R data types is important because it forms the basis of how we manipulate and analyze data in R. This includes creating vectors, matrices, lists, and data frames, and understanding how to subset and coerce these data types.
Data is the lifeblood of any programming language, and R is no exception. In this step, we will learn about the different data types in R and how to work with them. We will create vectors, matrices, lists, and data frames, and learn how to subset and coerce these data types. We will use the
c()
,matrix()
,list()
, anddata.frame()
functions to create these data types, and the[]
and$
operators to subset them. We will also explore how R coerces atomic collections.
Task 2.1: Creating a Vector
Create a vector named
student_ages
that contains the ages of five students: 16, 17, 18, 16, 17.π Hint
Use the
c()
function to combine the ages into a vector. Fill in the parentheses with the ages, separated by commas.π Solution
student_ages <- c(16, 17, 18, 16, 17)
Task 2.2: Creating a Matrix
Create a matrix named
student_scores
that contains the scores of three students in two subjects. The students will each be in their own row, with the subjects in each column. The scores are as follows: Student 1: 85, 90; Student 2: 80, 88; Student 3: 92, 95.π Hint
Use the
matrix()
function to create a matrix. Fill in the parentheses of thec()
function with the scores, separated by commas. Thenrow
argument specifies the number of rows in the matrix, and thencol
argument specifies the number of columns.
By default, thematrix()
function will fill in values column by column. To fill in the values row by row, set thebyrow
argument toTRUE
.π Solution
student_scores <- matrix(c(85, 90, 80, 88, 92, 95), nrow = 3, ncol=2, byrow=TRUE) # optionally, set names colnames(student_scores) <- c("Subject1", "Subject2") rownames(student_scores) <- c("Student1", "Student2", "Student3")
Task 2.3: Creating a List
Create a list named
student_info
that contains the following information about a student: Name: 'John', Age: 16, Scores: 85, 90.π Hint
Use the
list()
function to create a list. Fill in the parentheses with the name, age, and scores, separated by commas.π Solution
student_info <- list('Name' = 'John', 'Age' = 16, 'Scores' = c(85, 90))
Task 2.4: Creating a Data Frame
Create a data frame named
students
that contains the following information about three students: Names: 'John', 'Jane', 'Jim'; Ages: 16, 17, 18; Scores: 85, 90, 95.π Hint
Use the
data.frame()
function to create a data frame. Fill in the parentheses with the names, ages, and scores, separated by commas. Each column of the data frame should be a vector.π Solution
students <- data.frame('Names' = c('John', 'Jane', 'Jim'), 'Ages' = c(16, 17, 18), 'Scores' = c(85, 90, 95))
Task 2.5: Subsetting a Vector
Subset the
student_ages
vector to get the ages of the second and third students.π Hint
Use the square brackets
[]
to subset a vector. Fill in the brackets with the indices of the second and third students, separated by a comma.π Solution
student_ages[c(2, 3)]
Task 2.6: Subsetting a Matrix
Subset the
student_scores
matrix to get the scores of the second student.π Hint
Use the square brackets
[]
to subset a matrix. First, enter the index of the second student, followed by a comma. Leave the area after the comma blank to get all columns.π Solution
student_scores[2, ]
Task 2.7: Subsetting a List
Subset the
student_info
list to get the name of the student.π Hint
Use the dollar sign
$
to subset a list by name. After the dollar sign, type the name of the element you want to subset.π Solution
student_info$Name
Task 2.8: Subsetting a Data Frame
Subset the
students
data frame to get the names of the students.π Hint
Use the dollar sign
$
to subset a data frame by column. After the dollar sign, type the name of the column you want to subset.π Solution
students$Names
Task 2.9: Coercing Data Types
Coerce the
student_ages
vector to a character vector.π Hint
Use the
as.character()
function to coerce a vector to a character vector. Fill in the parentheses with the name of the vector you want to coerce.π Solution
student_ages <- as.character(student_ages)
-
Challenge
Processing Data with R
Processing Data with R
To review the concepts covered in this step, please refer to the Processing Data with R module of the Creating and Debugging R Programs course.
Processing data with R is important because it allows us to transform raw data into a format that is suitable for analysis. This includes ordering data, creating barplots, using the aggregate function, sourcing an R file, creating new variables, checking the type and class of data, and loading data into R from a CSV file.
Data processing is a crucial step in any data analysis workflow. In this step, we will learn how to process data with R. We will order data, create barplots, use the aggregate function, source an R file, create new variables, check the type and class of data, and load data into R from a CSV file. We will use the
order()
,barplot()
,aggregate()
,typeof()
,class()
, andread.csv()
functions to accomplish these tasks.
Task 3.1: Loading Data into R from a CSV File
Load the provided CSV file ('Student Scores.csv') into R and assign it to a variable named
student_scores
.π Hint
Use the
read.csv()
function to load the CSV file. The file path is provided as a string argument to the function.π Solution
student_scores <- read.csv('Student Scores.csv')
Task 3.2: Checking the Type and Class of Data
Check the type and class of the
student_scores
data frame.π Hint
Use the
typeof()
function to check the internal type of the data frame. Use theclass()
function to check the user-facing class of the data frame.π Solution
typeof(student_scores) class(student_scores)
Task 3.3: Creating New Variables
Create a new variable in the
student_scores
data frame namedtotal_score
that is the sum of themath_score
,english_score
, andscience_score
for each student.π Hint
Use the
$
operator to access and create new columns in the data frame. Use the+
operator orsum()
function to sum the individual columns.π Solution
student_scores$total_score <- student_scores$math_score + student_scores$english_score + student_scores$science_score
Task 3.4: Using the Aggregate Function
Use the
aggregate()
function to calculate the meantotal_score
for eachgender
in thestudent_scores
data frame. Save the result to an object namedscore_by_gender
.π Hint
The
aggregate()
function takes a formula as its first argument, where the left side of the~
is the column to aggregate and the right side is the grouping variable. Thedata
argument is the data frame to use, and theFUN
argument is the function to apply.π Solution
score_by_gender <- aggregate(total_score ~ gender, data = student_scores, FUN = mean)
Task 3.5: Ordering Data
Order the
student_scores
data frame bytotal_score
in descending order.π Hint
Use the
order()
function to order the data frame. The-
sign is used to order in descending order. The result of theorder()
function is used in conjunction with[]
to subset the data frame.π Solution
student_scores <- student_scores[order(-student_scores$total_score), ]
Task 3.6: Creating Barplots
Create a barplot of the mean
total_score
for eachgender
using thescore_by_gender
aggregate data frame you created in a previous step.π Hint
Use the
barplot()
function to create a barplot. The first argument is the heights of the bars, and thenames.arg
argument sets the labels for the bars.π Solution
barplot(score_by_gender$total_score, names.arg = score_by_gender$gender)
-
Challenge
Outputting Data with R
Outputting Data with R
To review the concepts covered in this step, please refer to the Outputting Data with R module of the Creating and Debugging R Programs course.
Outputting data with R is important because it allows us to save our results and share them with others. This includes saving plots in R, outputting data in R, and exporting data in R.
After processing and analyzing our data, we need to output our results. In this step, we will learn how to save plots in R, output data in R, and export data in R. We will use the
pdf()
,dev.off()
,write.csv()
, andwrite.table()
functions to save plots, output data, and export data, respectively.
Task 4.1: Load and Inspect the Data
Load the provided CSV file
Student Scores.csv
into R and assign it to a variable namedstudent_scores
.Examine the structure of the data frame.
π Hint
Use the
read.csv()
function to load the CSV file. The file path is provided as a string argument to the function.
Use thestr()
function to examine the structure of the data frame.π Solution
student_scores <- read.csv('Student Scores.csv') str(student_scores)
Task 4.2: Saving a Plot as a PDF
Subset the data frame to the first 3 students only. Call this new data frame
first_students
.Using this smaller dataset, save a PDF barplot with
math_score
as the height andname
as the labels. Use the filenamestudent_barplot.pdf
.π Hint
Use the
head()
function to select the first three students. The first argument should be the data, and the second argument is the number of rows to select. Alternatively, you can use the[]
operator to subset specific rows.Use the
pdf()
function to start the PDF device. Then use thebarplot()
function to print the plot to the active device. Finally, make sure to use thedev.off()
function to turn off the PDF device.π Solution
first_students <- head(student_scores, 3) pdf('student_barplot.pdf') barplot(first_students$math_score, names.arg = first_students$name) dev.off()
Task 4.3: Exporting Data as a CSV File
Export the
first_students
data frame as a CSV file named 'first_students.csv' using thewrite.csv()
function. Do not include row names in the CSV.π Hint
Use the
write.csv()
function to export the data. Setrow.names
toFALSE
to exclude row names in the output file.π Solution
write.csv(first_students, file = 'first_students.csv', row.names = FALSE)
Task 4.4: Exporting a Subset of Data as a Text File
Export rows 10 to 20 of
student_scores
as a text file named'subset_data.txt'
using thewrite.table()
function. The table should be tab separated. Do not include row names in your file.π Hint
Use the
write.table()
function to export the data. Setsep
to '\t' to separate the columns with tabs. Setrow.names
toFALSE
to not include row names in the output file.π Solution
write.table(student_scores[10:20, ], file = 'subset_data.txt', sep = '\t', row.names = FALSE)
-
Challenge
Debugging R Interactively
Debugging R Interactively
To review the concepts covered in this step, please refer to the Debugging R Interactively module of the Creating and Debugging R Programs course.
Debugging R interactively is important because it allows us to identify and fix errors in our code. This includes setting up a script for debugging, using R's browser to step through code, using conditional troubleshooting, and using RStudio breakpoints.
Debugging is an essential skill for any programmer. In this step, we will learn how to debug R code interactively. We will set up a script for debugging, use R's browser to step through code, use conditional troubleshooting, and use RStudio breakpoints. We will use the
browser()
,debug()
,debugSource()
,options()
, andrecover()
functions to debug our code.
Task 5.1: Using R's browser to Step Through Code
The provided function
calculate_average
computes the average of a vector of numbers in three distinct steps.Add a
browser()
call inside thecalculate_average
function to start the interactive debugging mode when the function is called. Then call the function on the provided vector, and use the console to step through the code line by line.π Hint
Place the
browser()
call inside the beginning of the function. Call the function with the vector name as the argument. If you need a reminder on how thebroswer()
function works, enter?browser
in the console to bring up the help page.π Solution
# Provided function calculate_average <- function(numbers) { browser() sum_numbers <- sum(numbers) length_numbers <- length(numbers) average <- sum_numbers / length_numbers return(average) } # Provided vector my_numbers <- c(1:15) calculate_average(my_numbers) # Use the `n` or `s` command to step through the function
Task 5.2: Using Conditional Troubleshooting
Modify the
calculate_average
function to include a conditionalbrowser()
call that only starts the interactive debugging mode if the length of the numbers vector is less than 1. Call the function onmy_numbers
, and then call the function on an empty vector.π Hint
Use an
if
statement to check if the length of the numbers vector is less than 1. If it is, callbrowser()
.π Solution
calculate_average <- function(numbers) { if (length(numbers) < 1) { browser() } sum_numbers <- sum(numbers) length_numbers <- length(numbers) average <- sum_numbers / length_numbers return(average) } calculate_average(my_numbers) calculate_average(c())
-
Challenge
Exploring R Environments
Exploring R Environments
To review the concepts covered in this step, please refer to the R Environments module of the Creating and Debugging R Programs course.
Understanding R environments is important because they provide a way to manage and organize variables and functions in R. This includes exploring environments, understanding how environments have a hierarchy, sharing values between environments, and using the recover function to debug.
In R, environments are key to managing and organizing variables and functions. In this step, we will explore R environments. We will explore different types of environments, understand how environments have a hierarchy, share values between environments, and use the recover function to debug.
Task 6.1: Creating a New Environment
Create a new environment called
my_env
.π Hint
Use the
new.env()
function to create a new environment.π Solution
my_env <- new.env()
Task 6.2: Assigning Values to a Custom Environment
Assign the value 5 to a variable
x
in themy_env
environment.π Hint
Use the
$
operator to reference a variable within a specific environment. Use the<-
operator to assign the value.π Solution
my_env$x <- 5
Task 6.3: Assigning Values to Different Environments
For this task, you will write three functions, each which assign the value 10 to a variable named
x
.The first function, named
assign_local
should assign1
to the variablex
within the function's own environment.The second function, named
assign_my_env
, should assign2
to the variablex
withinmy_env
.The third function, named
assign_global_env
, should assign3
to the variablex
within the global environment.Run all three functions.
π Hint
By default, R will use the function's environment for assignments within a function. To use a custom environment, specify the environment and use the
$
operator to reference a variable. To use the global environment, which is the parent of the function's environment, use the<<-
operator.π Solution
assign_local <- function(){ x <- 1 } assign_my_env <- function(){ my_env$x <- 2 } assign_global_env <- function(){ x <<- 3 } assign_local() assign_my_env() assign_global_env()
Task 6.4: Accessing Values from Different Environments
Print the value of
x
from each of the three environments.π Hint
Function environments are discarded after the function is run. To print the value of
x
from the function environment, you must callprint()
within the function.To print the value of
x
from your custom environment, use the$
operator.To print the value of
x
from the global environment, you can simply printx
.π Solution
# function environment assign_local <- function(){ x <- 1 print(x) } assign_local() # custom environment print(my_env$x) # global environment print(x)
Task 6.5: Debugging with the Recover Function
Set the error handler to the
recover
function using theoptions()
function. Then, run the providedparent_function()
directly from theConsole
window (rather than within the File viewer). Step through recovery mode to understand why the error occurred.π Hint
Use the
options()
function with the argumenterror = recover
to set the error handler to therecover
function. Run?recover
to learn more about using therecover()
function.π Solution
options(error = recover) # from the Console parent_function()
-
Challenge
Running R Non-interactively
Running R Non-interactively
To review the concepts covered in this step, please refer to the Running R Non-interactively module of the Creating and Debugging R Programs course.
Running R non-interactively is important because it allows us to automate the execution of R scripts. This includes running a script non-interactively and scheduling a job on Windows and Linux.
Running R scripts non-interactively allows us to automate our data analysis workflows. In this step, we will learn how to run an R script non-interactively from a terminal. We will use the
Rscript
command in the RStudio Terminal window to run a script non-interactively.
Task 7.1: Create an R Script
Create an R script named 'myscript.R' that prints the phrase "Hello World!". Save the script in the
workspace
directory.π Hint
To create a new file, go to
File
->New File
->R Script
. Save the file withFile
->Save As
.π Solution
print('Hello World!')
Task 7.2: Run the R Script Non-Interactively
Run the 'myscript.R' script non-interactively using the
Rscript
command in the terminal.π Hint
The Terminal is next to the Console in the bottom left corner of the screen. If the terminal does not automatically open in the
workspace
directory, use thecd
command to navigate to the directory containing 'myscript.R'.π Solution
Run in the Terminal:
Rscript myscript.R
-
Challenge
Debugging R Non-interactively
Debugging R Non-interactively
To review the concepts covered in this step, please refer to the Debugging R Non-interactively module of the Creating and Debugging R Programs course.
Debugging R non-interactively is important because it allows us to identify and fix errors in our code when it is run in a non-interactive environment. This includes using proper logging, saving the script output, reproducing the execution state of a non-interactive script, and syncing the scripts output.
Debugging R scripts that are run non-interactively presents its own set of challenges. In this step, we will learn how to debug R scripts non-interactively. We will use proper logging, save the script output, reproduce the execution state of a non-interactive script, and sync the scripts output. We will use the
cat()
,sink()
,options()
,dump.frames()
, anddebugger()
functions to accomplish these tasks.
Task 8.1: Create an R Script
Create an R script named 'myscript2.R' that prints the phrase "Hello World!". Save the script in the
workspace
directory, then execute the script from the Terminal.π Hint
To create a new file, go to
File
->New File
->R Script
. Save the file withFile
->Save As
.π Solution
print('Hello World!')
To execute from the Terminal:
Rscript myscript2.R
Task 8.2: Logging Messages with Formatting
Modify the script to instead log the following multiline message:
Hello... World!
Execute the R script from the Terminal window.
π Hint
You can use the
cat()
function to log a message with multiple lines.π Solution
cat('Hello... World!')
To execute from the Terminal:
Rscript myscript2.R
Task 8.3: Saving Script Output
Use the
sink()
function to save the output of your script to a file namedoutput.txt
. This is useful for debugging purposes, as it allows you to review the output of your script after it has finished running.Execute your new script and verify that the output was saved in the
workspace
directory.π Hint
You can use
sink()
function to redirect the output to a file. For example,sink('output.txt')
will redirect the output to 'output.txt'. Don't forget to turn off the sink withsink()
when you're done.π Solution
sink('output.txt') cat('Hello... World!') sink()
Task 8.4: Reproducing Execution State
Create a custom
on_error
function that dumps the execution state to a file and includes the global environment. Use theoptions()
function to make sure this custom function executes when an error is encountered.Save a variable to the environment with the name
error_message
and the value'Some error'
. Then simulate an error by addingstop(error_message)
to the end of your script. Execute your script, and verify that a dump file was created.π Hint
Within your
on_error()
function, use thedump.frames()
function to dump the execution state. Set theto.file
argument toTRUE
to save to a file. Set theinclude.GlobalEnv
argument toTRUE
to include the global environment.π Solution
on_error <- function(){ dump.frames(to.file = TRUE, include.GlobalEnv = TRUE) } options(error = on_error) sink('output.txt') cat('Hello... World!') sink() error_message <- 'Some error' stop(error_message)
Task 8.5: Use the Debugger on the Saved Context
Use the
debugger()
function to walk through thelast.dump.rda
file you created in the previous step.π Hint
Load the
last.dump.rda
file into the current environment by clicking on it in the Files pane or by using theload()
function. Then call thedebugger()
function from the Console pane to start the debugger. The argument todebugger()
should be the loadedlast.dump
object.π Solution
load('last.dump.rda') debugger(last.dump)
-
Challenge
Troubleshooting and Avoiding Common Debugging Issues
Troubleshooting and Avoiding Common Debugging Issues
To review the concepts covered in this step, please refer to the Troubleshooting and Avoiding Common Debugging Issues module of the Creating and Debugging R Programs course.
Troubleshooting and avoiding common debugging issues is important because it allows us to write more robust and error-free code. This includes understanding type API differences, function pitfalls, recapping environments, and easily sharing data with your peers.
In this final step, we will learn how to troubleshoot and avoid common debugging issues in R. We will understand type API differences, function pitfalls, recap environments, and learn how to easily share data with your peers.
Task 9.1: Understanding Type API Differences
In R, different types of data structures have different APIs. The provided function
df_col_sum()
takes the sum of the first column of a data frame. Unfortunately, if you run it on a matrix, it will produce the wrong value.
Write a new function calledmatrix_col_sum()
that takes the sum of the first column of a matrix and run it with the provided matrixmat
.π Hint
If given a single value, the
[]
operator in the data frame API will subset a column. In contrast, if given a single value, the[]
operator in the matrix API will subset a single value that corresponds to the nth element
To subset a column in the matrix API, use[]
but provide both rows and columns, separated by a comma. To select all rows, leave the rows section blank, but include the comma.π Solution
mat_col_sum <- function(input){ return(sum(input[,1])) } mat_col_sum(mat)
Task 9.2: Function Pitfalls
Some functions in R have pitfalls that can lead to unexpected results. The provided function
broken_mean
is not filtering out NA values as expected. Analyze and fix the function.π Hint
In R, named arguments are ignored if they are misspelled. Try to identify if there is a misspelling. Use
?mean
to view the available named arguments.π Solution
# na.rm was misspelled as na_rm fixed_mean <- function(values){ return(mean(values, na.rm=TRUE)) } fixed_mean(c(1, 2, 3, NA))
Task 9.3: Recapping Environments
In R, an environment is a collection of objects. Let's recap environments by creating a new environment, assigning a variable to it with the value 5, and retrieving the value of the variable.
π Hint
Use the
new.env()
function to create a new environment. Use the$
operator to set or retrieve a variable on the specified environment.π Solution
new_env <- new.env() new_env$x <- 5 new_env$x
Task 9.4: Easily Sharing Data with Your Peers
In R, you can easily share data with your peers by using the
dput()
function. Demonstrate this by usingdput()
to generate a reproducible expression that recreates a thedf
data frame from the first task.π Hint
Use the
dput()
function to generate a reproducible expression.π Solution
dput_expression <- dput(df)
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the authorβs guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.