- Lab
- Data

Importing Formatted Text Files: R Playbook Hands-on Practice
In this lab, Importing Formatted Text Files: R Playbook Hands-on Practice, you'll navigate through the core techniques of data importation and manipulation in R. Starting with text files and advancing through CSV, JSON, and XML formats, learn to use functions like read.table, read.csv, and fromJSON for effective data handling. Master arguments for custom imports, tackle missing values, and convert complex formats into R-analyzable structures. By the end, you'll possess a well-rounded skill set for importing and preparing data from diverse sources for in-depth analysis, ready to address any data importation challenge in your projects.

Path Info
Table of Contents
-
Challenge
Importing and Manipulating Text Files
Jupyter Guide
To get started, open the file on the right entitled "Step 1...". You'll complete each task for Step 1 in that Jupyter Notebook file. Remember, you must run the cells
(ctrl/cmd(β) + Enter)
for each task before moving onto the next task in the Jupyter Notebook. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.
Importing and Manipulating Text Files
To review the concepts covered in this step, please refer to the Importing Text Files in R module of the Importing Formatted Text Files: R Playbook course.
Understanding how to import and manipulate text files is crucial because it lays the foundation for data analysis in R. This step covers the basics of reading text files, including using various arguments in the read.table function to customize the import process.
Dive into the world of text files with R! Your goal is to practice importing text files into R and manipulating the imported data to fit your analysis needs. You'll use the
read.table
function, exploring its various arguments such asheader
,sep
,skip
, andstringsAsFactors
to import a sample text file. After importing, you'll practice subsetting the data by reading specific lines using theskip
andnrows
arguments. This hands-on experience will solidify your understanding of handling text data in R.
Task 1.1: Importing a Text File
Start by importing the
employee_data.txt
file using theread.table
function. The file is space separated. Make sure to include theheader
argument to specify if the first line of the file should be treated as the column names.π Hint
Use the
read.table
function with thefile
parameter pointing to your text file's location. Set the separator character to a space. As the data has column names, don't forget to set the
header
argument toTRUE
.π Solution
data <- read.table(file='employee_data.txt', header=TRUE, sep=' ') data
Task 1.2: Customizing the Separator
Now import the
employee_data_pipe_separated.txt
file. This data is separated with a pipe character|
. Customize the import process by specifying a new column separator using thesep
argument in theread.table
function.π Hint
Use the
sep
argument to specify the separator character. In this case, set tosep='|'
.π Solution
data <- read.table(file='employee_data_pipe_separated.txt', header=TRUE, sep='|') data
Task 1.3: Skipping Rows and Reading Specific Lines
With
employee_data.txt
, practice subsetting the data by reading specific lines. Use theskip
argument to ignore the first line of the file (the header), andnrows
to read two lines after skipping.π Hint
To skip the first line of the file and then read the next 2 lines, set
skip=1
andnrows=2
. Since you've removed the header, set header toFALSE
.π Solution
data <- read.table(file='employee_data.txt', header=FALSE, sep=' ', skip=1, nrows=2) data
Task 1.4: Handling Strings as Factors
Explore how to handle strings in your imported data. For columns that are strings, import them as factors.
π Hint
Set
stringsAsFactors=FALSE
if you want to keep strings as character vectors. Otherwise, set it toTRUE
to convert them into factors.π Solution
data <- read.table(file='employee_data.txt', header=TRUE, sep=' ', stringsAsFactors=TRUE) data
-
Challenge
Mastering CSV File Imports
Mastering CSV File Imports
To review the concepts covered in this step, please refer to the Importing CSV Files in R module of the Importing Formatted Text Files: R Playbook course.
Mastering the import of CSV files is important because CSV is one of the most common data formats in data analysis. This step focuses on using the
read.csv
function and its arguments to effectively import and preprocess CSV data in R.Embark on a journey to master CSV file imports in R! Your mission is to utilize the
read.csv
function to import a CSV file into an R data frame. Pay special attention to handling missing values using thena.strings
argument and selecting specific columns and defining their data types with thecolClasses
argument. This practice will enhance your ability to work with one of the most prevalent data formats in data analysis.
Task 2.1: Importing a CSV File
Import a CSV file named
data.csv
into an R data frame. Use theread.csv
function to accomplish this task. Ensure you visualize the imported data by printing the first few rows of the data frame.π Hint
Use the
read.csv
function with the file name as its argument. To print the first few rows, use thehead
function.π Solution
# Import the CSV file data <- read.csv('employee_data.csv') # Print the first few rows of the data frame head(data)
Task 2.2: Handling Missing Values
In
employee_data.csv
, some values are coded asNaN
, which implies a missing value. Modify the previous task to properly import these values in the CSV file. Then verify these values were properly coded asNA
in R, rather than as a character string.π Hint
Add the
na.strings
argument to theread.csv
function call, setting its value to 'NaN'. Examine the NA values with theis.na
function.π Solution
# Import the CSV file with missing values handled data <- read.csv('employee_data.csv', na.strings = 'NaN') # Print the data head(data) # Verify NAs were handled properly is.na(data)
Task 2.3: Selecting Specific Columns and Defining Their Data Types
Now, import the same CSV file but only select the columns
ID
andSalary
, and define their data types as character and double respectively.π Hint
Use the
colClasses
argument in theread.csv
function. Provide a named vector to this argument specifying the columns and their desired data types. If you want to exclude a column, set its data type to'NULL'
.π Solution
# Import the CSV file selecting specific columns data <- read.csv('employee_data.csv', na.strings = 'NaN', colClasses = c(ID = 'character', Salary = 'double', Name='NULL', Department='NULL')) # Print the first few rows of the data frame head(data)
-
Challenge
Delving into Delimited Files and Dataframe Searches
Delving into Delimited Files and Dataframe Searches
To review the concepts covered in this step, please refer to the Importing Delimited Files in R module of the Importing Formatted Text Files: R Playbook course.
Delving into delimited files and understanding dataframe searches is essential because it expands your data import capabilities and enhances your data manipulation skills in R. This step combines importing delimited files with searching dataframes using
which.max
andwhich.min
functions. This exercise will broaden your data handling skills and introduce you to more complex data analysis techniques in R.
Task 3.1: Importing a Tab-Delimited File
Import the tab-delimited file named
employee_data_tab_separated.txt
into R using theread.delim
function. Store the imported data in a variable nameddata_df
. After importing, display the first few rows of the dataframe to ensure it's loaded correctly.π Hint
Use the
read.delim
function with the file nameemployee_data_tab_separated.txt
as its argument. Tab separation is the default inread.delim
. To display the first few rows, use thehead
function ondata_df
.π Solution
data_df <- read.delim('employee_data_tab_separated.txt') head(data_df)
Task 3.2: Finding the Maximum Value in a Column
Find the index of the maximum value in the
Salary
column of thedata_df
dataframe. Store the index in a variable namedmax_index
and print it.π Hint
Use the
which.max
function on theSalary
column ofdata_df
to find the index. Access theSalary
column usingdata_df$Salary
.π Solution
max_index <- which.max(data_df$Salary) print(max_index)
Task 3.3: Finding the Minimum Value in a Column
Find the index of the minimum value in the
Salary
column of thedata_df
dataframe. Store the index in a variable namedmin_index
and print it.π Hint
Use the
which.min
function on theSalary
column ofdata_df
to find the index. Access theSalary
column usingdata_df$Salary
.π Solution
min_index <- which.min(data_df$Salary) print(min_index)
-
Challenge
Converting JSON Data for R Analysis
Converting JSON Data for R Analysis
To review the concepts covered in this step, please refer to the Importing JSON Files in R module of the Importing Formatted Text Files: R Playbook course.
Learning to convert JSON data into a format that R can analyze is crucial because JSON is a widely used data format in web applications. This step focuses on importing JSON data and converting it into an R list or data frame for analysis.
Step into the world of JSON data with R! Your task is to practice importing JSON data using the
fromJSON
function from therjson
package. After importing, you'll convert the JSON data into an R list and then into a data frame. This exercise will equip you with the skills to handle JSON data, a common format in web-based data sources.
Task 4.1: Import JSON Data from a Local File
Your initial task is to import JSON data from a local file. You'll import the
employee_data.json
file which is located in the current working directory. Store the imported data in a variable namedjson_data
and print it to verify that the data was imported.π Hint
Load the `rjson` package. Employ the `fromJSON` function, providing the path to `employee_data.json` as a string argument.π Solution
# Load the rjson R package library(rjson) # Import the data json_data <- fromJSON(file = 'employee_data.json') print(json_data)
Task 4.2: Convert JSON Data to a Data Frame
The
fromJSON
function imports the data as a list of lists. Transform the JSON data into a matrix, and from a matrix into an R data frame. Store the resulting data frame in a variable nameddf_data
and print it.π Hint
First, convert the list into a matrix format by using `do.call` and `rbind`. Then, convert the matrix to a data frame using `as.data.frame`.π Solution
# Convert to a matrix matrix_data <- do.call(rbind, json_data) # Convert the matrix to a data frame df_data <- as.data.frame(data_matrix) print(df_data)
-
Challenge
Importing and Analyzing XML Data
Importing and Analyzing XML Data
To review the concepts covered in this step, please refer to the Importing XML Files in R module of the Importing Formatted Text Files: R Playbook course.
Importing and analyzing XML data is important because XML is frequently used in data exchange and storage. This step covers importing XML data into R and converting it into a usable format for analysis. This hands-on experience will prepare you to work with XML data, enhancing your data import and analysis capabilities in R.
Task 5.1: Load the XML Package
Load the
XML
package, which comes pre-installed in this R environment. This package provides functions necessary for importing and processing XML files.π Hint
Use the
library()
function to load a package, with the package name as the argument.π Solution
# Load the XML package into R library(XML)
Task 5.2: Import XML Data from a Local File
Import the file
employee_data.xml
into R as a data frame. Print the data frame to verify it imported correctly.π Hint
Use
xmlToDataFrame(file_path)
to import the XML file.π Solution
# Import the data xml_data <- xmlToDataFrame('employee_data.xml') xml_data
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the authorβs guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.