Hamburger Icon
  • Labs icon Lab
  • Data
Labs

Querying and Converting Data Types in R Hands-on Practice

This lab teaches querying and converting data types in R, starting with the basics of dataset structures and moving through data manipulation techniques. It provides practical examples for effective data querying and filtering. The course concludes with essential resources for further learning in data analysis.

Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 1h 4m
Published
Clock icon Mar 26, 2024

Contact sales

By filling out this form and clicking submit, you acknowledge ourΒ privacy policy.

Table of Contents

  1. Challenge

    Exploring and Managing Data with RStudio

    RStudio Guide

    To get started, click on the 'workspace' folder in the bottom right pane of RStudio. Click on the file entitled "Step 1...". You may want to drag the console pane to be smaller so that you have more room to work. You'll complete each task for Step 1 in that R Markdown file. Remember, you must run the cells with the play button at the top right of each cell for a task before moving onto the next task in the R Markdown file. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.


    Exploring and Managing Data with RStudio

    To review the concepts covered in this step, please refer to the Understanding Dataset Structures and Formats module of the Querying and Converting Data Types in R course.

    Understanding Dataset Structures and Formats is important because it lays the foundation for all data analysis tasks in R. This step focuses on practical skills like exploring datasets, using the RStudio interface effectively, and managing data types and packages, which are crucial for any aspiring R programmer.

    Start by loading the dataset (mtcars) available in R. Use the summary() function to get an overview of the dataset. Explore the dataset by attaching it and using basic R commands to query and filter data. Convert a data frame to a tibble and a data table, observing the differences in output and performance. This exercise will help you become familiar with RStudio's interface and the basic data manipulation tasks in R.


    Task 1.1: Loading and Summarizing the Dataset

    Start by loading the mtcars dataset, which is built into R. Use the summary() function to get a quick overview of the dataset. This will help you understand the structure and the type of data it contains.

    πŸ” Hint

    You can access the mtcars dataset directly since it's built into R. Use the summary() function by passing the dataset name as an argument.

    πŸ”‘ Solution
    # Load the mtcars dataset
    mtcars
    
    # Use the summary function to get an overview of the dataset
    summary(mtcars)
    

    Task 1.2: Exploring the Dataset

    Attach the mtcars dataset. Find the cars with an mpg (miles per gallon) greater than 20.

    πŸ” Hint

    Use the attach() function to make the mtcars dataset's columns directly accessible. Then, use the dataset column mpg directly in a conditional statement to filter the data.

    πŸ”‘ Solution
    # Attach the mtcars dataset
    attach(mtcars)
    
    # Find cars with mpg greater than 20
    mtcars[mpg > 20, ]
    

    Task 1.3: Converting Data Frame to Tibble

    Convert the mtcars data frame to a tibble named mtcars_tibble. Print mtcars_tibble and observe the differences in output. Tibbles are a modern take on data frames, but with some added conveniences.

    πŸ” Hint

    Use the as_tibble() function from the tibble package to convert mtcars into a tibble. Make sure to load the tibble package using library(tibble) before converting.

    πŸ”‘ Solution
    # Load the tibble package
    library(tibble)
    
    # Convert mtcars to a tibble
    mtcars_tibble <- as_tibble(mtcars)
    
    # Observe the output
    print(mtcars_tibble)
    

    Task 1.4: Converting Data Frame to Data Table

    Now, convert the mtcars data frame to a data table named mtcars_data_table and observe how the output differs from a regular data frame and a tibble.

    πŸ” Hint

    Use the as.data.table() function from the data.table package to convert mtcars into a data table. Remember to load the data.table package using library(data.table) before converting.

    πŸ”‘ Solution
    # Load the data.table package
    library(data.table)
    
    # Convert mtcars to a data table
    mtcars_data_table <- as.data.table(mtcars)
    
    # Observe the output
    print(mtcars_data_table)
    
  2. Challenge

    Data Type Conversion and List Management

    Data Type Conversion and List Management

    To review the concepts covered in this step, please refer to the Selecting and Converting Data Types module of the Querying and Converting Data Types in R course.

    Selecting and converting data types is important because the ability to manipulate and convert data types is essential for data preprocessing and analysis in R. This step will focus on converting between numeric, integer, character, and factor variables, as well as managing lists, which are common tasks in data science projects.

    This exercise will enhance your understanding of R's data types and how to manipulate them.


    Task 2.1: Create a Mixed Data Vector

    Create a vector named mixed_data containing a mix of numeric, integer, character, and logical values. Include the numeric value 42.2, the integer 3, the character string R is fun, and the boolean TRUE.

    πŸ” Hint

    Use the c() function to combine values of different types. Remember to use quotes for character values and TRUE or FALSE for logical values.

    πŸ”‘ Solution
    mixed_data <- c(42.2, 3L, 'R is fun', TRUE)
    
    

    Task 2.2: Convert Mixed Data to Numeric

    Convert the mixed_data vector to numeric type and assign it to a new variable numeric_data. Print the new variable. Note that non-numeric values will be coerced to NA.

    πŸ” Hint

    Use the as.numeric() function and pass mixed_data as the argument.

    πŸ”‘ Solution
    numeric_data <- as.numeric(mixed_data)
    print(numeric_data)
    
    

    Task 2.3: Convert Mixed Data to Integer

    Convert the mixed_data vector to integer type and assign it to a new variable integer_data. Print the new variable. Non-numeric values will be coerced to NA, while float values will lose information after the decimal.

    πŸ” Hint

    Use the as.integer() function and pass mixed_data as the argument.

    πŸ”‘ Solution
    integer_data <- as.integer(mixed_data)
    print(integer_data)
    

    Task 2.4: Convert Mixed Data to Character

    Convert the mixed_data vector to character type and assign it to a new variable character_data. Print the new variable.

    πŸ” Hint

    Use the as.character() function and pass mixed_data as the argument.

    πŸ”‘ Solution
    character_data <- as.character(mixed_data)
    print(character_data)
    

    Task 2.5: Factor to Character Conversion

    Create a factor variable named factor_var with values 'high', 'low', 'high', 'medium', and levels 'low', 'medium', and 'high'. Print factor_var. Then, convert this factor to a character variable named char_var and print it.

    πŸ” Hint

    To create a factor variable, use the factor() function. The first argument should be the values, and the second argument should specify the levels. Use the as.character() function and pass factor_var as the argument to convert it to a character vector.

    πŸ”‘ Solution
    factor_var <- factor(c('high', 'low', 'high', 'medium'), levels=c('low', 'medium', 'high'))
    print(factor_var)
    char_var <- as.character(factor_var)
    print(char_var)
    

    Task 2.6: Logical to Numeric Conversion

    Create a logical vector named logical_vec with values TRUE, FALSE, and TRUE. Convert this logical vector to a numeric vector named numeric_vec and print it.

    πŸ” Hint

    Use the as.numeric() function and pass logical_vec as the argument.

    πŸ”‘ Solution
    logical_vec <- c(TRUE, FALSE, TRUE)
    numeric_vec <- as.numeric(logical_vec)
    print(numeric_vec)
    

    Task 2.7: Convert Character to POSIXct and POSIXlt

    Given the provided character vector char_dates with dates in the format 'YYYY-MM-DD HH:MM:SS', convert to to POSIXct and POSIXlt formats, assigning them to posixct_dates and posixlt_dates respectively.

    πŸ” Hint

    Use the as.POSIXct() function for converting to POSIXct and as.POSIXlt() for POSIXlt. The format string should be '%Y-%m-%d %H:%M:%S'.

    πŸ”‘ Solution
    char_dates <- c('2023-04-01 12:00:00', '2023-04-02 15:30:00')
    posixct_dates <- as.POSIXct(char_dates, format = '%Y-%m-%d %H:%M:%S')
    posixlt_dates <- as.POSIXlt(char_dates, format = '%Y-%m-%d %H:%M:%S')
    
    

    Task 2.8: Create and Explore a List

    Create a list named mixed_list containing elements of different data types. Each element should be named. Include the numeric value 42.2, the integer 3, the character R is fun, the boolean TRUE, and the vector posixct_dates from the prior task. Then, explore the structure of this list using the str() function.

    πŸ” Hint

    Use the list() function to combine elements of different types. To explore the list, use the str() function with mixed_list as the argument.

    πŸ”‘ Solution
    mixed_list <- list(numeric_value=42.2, integer_value=3L, character_value='R is fun', boolean_value=TRUE, vector_value=posixct_dates)
    str(mixed_list)
    
    
  3. Challenge

    Advanced Data Querying and Filtering

    Advanced Data Querying and Filtering

    To review the concepts covered in this step, please refer to the Querying and Filtering Data module of the Querying and Converting Data Types in R course.

    Querying and Filtering Data is important because these are fundamental skills for extracting insights from data. This step will cover advanced querying and filtering techniques using data frames, data tables, and tibbles, which are essential for any data analysis project in R.

    Practice querying a data frame using box brackets and logical tests. Explore the use of the subset() function to filter data frames based on specific criteria. Move on to querying and filtering a data table, using advanced techniques like the %in% operator. Experiment with the dplyr package to perform queries on a tibble, using functions like filter(), arrange(), select(), and mutate(). This step will help you master the art of data querying and filtering in R, using a variety of data structures.


    Task 3.1: Querying a Data Frame with Logical Tests

    Load the mtcars dataset, one of the default datasets that comes with R. Use logical tests within box brackets [] to query the data frame. Extract rows where the mpg column is greater than 25.

    πŸ” Hint Use the `data()` function to load `mtcars`. Use the syntax data_frame_name[condition, ] to perform the query. The condition should be a logical test applied to one of the columns, like data_frame$mpg > 25.
    πŸ”‘ Solution
    data(mtcars)
    mtcars[mtcars$mpg > 25, ]
    

    Task 3.2: Filtering Data Frames with the subset() Function

    Utilize the subset() function to filter rows from a data frame based on a specific condition. Filter out rows where the hp (horsepower) is less than 100.

    πŸ” Hint The subset() function syntax is subset(x, subset), where x is the data frame and subset is the condition. For example, subset(df, hp < 100).
    πŸ”‘ Solution
    subset(mtcars, hp < 100)
    

    Task 3.3: Querying a Data Table with Advanced Operators

    Load the data.table package. Convert mtcars to a data.table named mtcars_dt. Query the data table using the %in% operator and order() function. Extract rows where the cyl (number of cylinders) is either 4 or 6, and order them by wt (weight) in descending order.

    πŸ” Hint To use the %in% operator, apply it within the i argument of the data table syntax DT[i, j, by]. For ordering, use the order() function within the j argument, and set it to `-column_name` for descending order.
    πŸ”‘ Solution
    library('data.table')
    
    mtcars_dt <- as.data.table(mtcars)
    mtcars_dt[cyl %in% c(4, 6), .SD, .SDcols = c('cyl', 'wt')][order(-wt)]
    

    Task 3.4: Performing Queries on a Tibble with dplyr

    Use the dplyr package to perform complex queries on a tibble. Convert the mtcars data frame to a tibble called mtcars_tbl. First, filter rows where mpg is greater than 25. Second, arrange them by carb (number of carburetors) in ascending order. Third, select only the carb and mpg columns. Finally, add a new column mpg_plus_one that is mpg + 1.

    πŸ” Hint Chain the functions using the %>% operator. Start with `filter()` to apply the mpg condition, then use `arrange()` to sort by carb, followed by `select()` to pick columns, and finally `mutate()` to add the new column.
    πŸ”‘ Solution
    library('dplyr')
    
    mtcars_tbl <- as_tibble(mtcars)
    mtcars_tbl %>% 
      filter(mpg > 25) %>% 
      arrange(carb) %>% 
      select(carb, mpg) %>% 
      mutate(mpg_plus_one = mpg + 1)
    
  4. Challenge

    Leveraging the Tidyverse for Data Preprocessing

    Leveraging the Tidyverse for Data Preprocessing

    To review the concepts covered in this step, please refer to the Course Summary and Further Resources module of the Querying and Converting Data Types in R course.

    Course Summary and Further Resources is important because it consolidates the learning and introduces powerful tools for data preprocessing. This step emphasizes the use of the Tidyverse suite of packages for efficient data manipulation, which is a critical skill in data science.

    Load the tidyverse package, which comes pre-installed in this environment. Practice using readr to import data and tidyr for data cleaning tasks. Explore the use of purrr for applying functions across elements in a list or vector. Use dplyr to perform data manipulation tasks such as selecting, renaming, summarizing, and mutating data. This exercise will familiarize you with the Tidyverse ecosystem, enhancing your data preprocessing capabilities in R.


    Task 4.1: Loading the tidyverse Package

    Begin by loading the tidyverse package to access its suite of tools for data manipulation. This is the first step in utilizing the powerful features of the tidyverse for data preprocessing.

    πŸ” Hint

    Use the library() function and specify the name of the package you want to load, which is tidyverse.

    πŸ”‘ Solution
    library('tidyverse')
    

    Task 4.2: Importing Data with readr

    Use the readr package, part of the Tidyverse, to import the CSV file named sample_data.csv into R. Store the imported data in a variable named data for further manipulation.

    πŸ” Hint

    You don't need to load readr explicitly, as it is already loaded by tidyverse. Assign the result of read_csv() function to a variable. The function takes the file name as its argument, which in this case is sample_data.csv.

    πŸ”‘ Solution
    data <- read_csv('sample_data.csv')
    

    Task 4.3: Cleaning Data with tidyr

    With the tidyr package, part of the Tidyverse, practice cleaning the imported data. Specifically, use the drop_na() function to remove any rows with missing values from the data dataframe.

    πŸ” Hint

    Use the drop_na() function on the data variable to remove rows with NA values. Assign the result back to a variable.

    πŸ”‘ Solution
    data <- drop_na(data)
    

    Task 4.4: Applying Functions with purrr

    Utilize the purrr package, part of the Tidyverse, to apply a function that doubles the values of the numbers column. Store the result in a new variable named doubled_numbers and print that variable.

    πŸ” Hint

    Use the map_dbl() function from purrr. The first argument is the vector numbers, and the second argument is a formula that specifies the function to apply, in this case, doubling the values.

    πŸ”‘ Solution
    doubled_numbers <- map_dbl(data$numbers, ~ .x * 2)
    print(doubled_numbers)
    

    Task 4.5: Data Manipulation with dplyr

    Using the dplyr package, part of the Tidyverse, perform a series of data manipulation tasks on the data dataframe. First, select only the columns id and value. Then, rename the column value to measurement. Finally, add a new column measurement_sq that contains the square of measurement. Print the new version of data.

    πŸ” Hint

    Chain the operations using the %>% operator. Use select() to choose columns, rename() to change column names, and mutate() to add new columns. Replace placeholders with the correct column names and operations.

    πŸ”‘ Solution
    data <- data %>% 
      select(id, value) %>% 
      rename(measurement = value) %>% 
      mutate(measurement_sq = measurement^2)
    print(data)
    

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.