Hamburger Icon
  • Labs icon Lab
  • Data
Labs

Understanding Statistical Models and Mathematical Models Hands-on Practice

In this lab, you will learn to apply mathematical and statistical models using R, starting with basic concepts and progressing to real-world applications. You will then model phenomena like population growth and financial risk, solve complex problems such as the 8-queens puzzle, and perform hypothesis testing using T-tests and Z-tests.

Labs

Path Info

Duration
Clock icon 1h 0m
Published
Clock icon Apr 05, 2024

Contact sales

By filling out this form and clicking submit, you acknowledge ourΒ privacy policy.

Table of Contents

  1. Challenge

    Exploring Data and Metadata in R

    RStudio Guide

    To get started, click on the 'workspace' folder in the bottom right pane of RStudio. Click on the file entitled "Step 1...". You may want to drag the console pane to be smaller so that you have more room to work. You'll complete each task for Step 1 in that R Markdown file. Remember, you must run the cells with the play button at the top right of each cell for a task before moving onto the next task in the R Markdown file. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.


    Exploring Data and Metadata in R

    To review the concepts covered in this step, please refer to the Understanding Statistical and Mathematical Models module of the Understanding Statistical Models and Mathematical Models course.

    Understanding the structure and context of your data is crucial for effective data analysis and model selection. This step will help you practice interpreting data with the context provided by metadata, a foundational skill in both statistical and mathematical modeling.

    Dive into the world of data analysis by exploring the HRIS.csv dataset available in the lab environment. Your goal is to understand the structure of the dataset and the context provided by its metadata. Use the str() function to examine the structure of the dataset, highlighting the types of data and any potential challenges they may present. Then, leverage the comment() function to add and query metadata for the dataset, providing context that could influence your analysis and model selection. This hands-on experience will solidify your understanding of how metadata can impact data interpretation and model applicability.


    Task 1.1: Loading and Exploring the Dataset

    Begin your journey into data analysis by loading the HRIS.csv dataset into R. Use the read.csv() function to load the dataset and assign it to a variable. Then, use the str() function to examine the structure of the dataset, focusing on the types of data it contains and any potential challenges these data types may present for analysis.

    πŸ” Hint

    Use read.csv('HRIS.csv') to load the dataset into a variable named hris_data. Then, call str(hris_data) to display its structure.

    πŸ”‘ Solution
    # Load the HRIS.csv dataset
    hris_data <- read.csv('HRIS.csv')
    
    # Examine the structure of the dataset
    str(hris_data)
    

    Task 1.2: Adding Metadata to the Dataset

    Now that you have a basic understanding of the dataset's structure, it's time to add context to it by adding metadata. Use the comment() function to add a brief description to the hris_data dataset. This description should provide context about what the dataset represents and any relevant information that could influence analysis or model selection.

    πŸ” Hint

    Use comment(hris_data) <- 'your description here' to add a description to the dataset.

    πŸ”‘ Solution
    comment(hris_data) <- 'This dataset contains HR information including employee details, positions, and salaries.'
    

    Task 1.3: Querying Metadata

    With metadata added to your dataset, explore how you can query this information to better understand the context of your data. Use the comment() function to retrieve the description you added to the hris_data dataset.

    πŸ” Hint

    To retrieve the description you added to the dataset, simply use comment(hris_data).

    πŸ”‘ Solution
    my_comment <- comment(hris_data)
    print(my_comment)
    
  2. Challenge

    Modeling Population Growth with ODEs

    Modeling Population Growth with ODEs

    To review the concepts covered in this step, please refer to the Case Studies on Statistical and Mathematical Models module of the Understanding Statistical Models and Mathematical Models course.

    Ordinary Differential Equations (ODEs) are a powerful tool for modeling deterministic systems, such as population growth. This step will enhance your ability to apply mathematical models to real-world problems, focusing on the deterministic nature of such models.

    In this task, you'll apply your knowledge of Ordinary Differential Equations (ODEs) to model population growth. Using the deSolve package in R, set up and solve an ODE that models population growth based on the Verhulst's Decreasing Growth Model. You'll need to define the initial population size, the rate of growth, and the carrying capacity of the environment. Visualize the solution using R's plotting functions to understand how the population evolves over time. This exercise will deepen your understanding of how mathematical models can be used to predict deterministic systems.


    Task 2.1: Loading the deSolve Package

    Before we can start modeling population growth with ODEs, we need to load the deSolve package which provides functions to solve initial value problems for differential equations. This package is essential for our task.

    πŸ” Hint

    Use the library function to load the deSolve package.

    πŸ”‘ Solution
    library('deSolve')
    

    Task 2.2: Defining the Population Growth Model

    Define a function pop_growth that represents the Verhulst's Decreasing Growth Model. This function will take three parameters: time t, state y, and parameters parms. The parms argument in this case will include a list specifying the rate of growth r and the carrying capacity K. The function should return the rate of change of the population size.

    πŸ” Hint

    The rate of change of the population size can be calculated using the formula dY = r * y * (1 - y / K), where r is the rate of growth, y is the current population size, and K is the carrying capacity. When retrieving items from the parms list, make sure to use the double bracket syntax [[]].

    πŸ”‘ Solution
    pop_growth <- function(t, y, parms) {
      r <- parms[['r']]
      K <- parms[['K']]
      dY <- r * y * (1 - y / K)
      list(c(dY))
    }
    

    Task 2.3: Setting Initial Conditions and Parameters

    Create variables representing the initial conditions and parameters for the population growth model. Start with an initial population size of 100, a growth rate of 0.1, and a carrying capacity of 1000. Define a sequence of times from 1 to 100.

    πŸ” Hint

    Assign the specified numbers to variables with appropriate names corresponding to the formula. Use the seq() function to create a sequence of numbers.

    πŸ”‘ Solution
    # Initial population size
    y0 <- 100
    
    # Rate of growth
    r <- 0.1
    
    # Carrying capacity
    K <- 1000
    
    # Sequence of times	
    times <- seq(1, 100, by = 1)
    

    Task 2.4: Solving the ODE

    Use the ode function from the deSolve package to solve the ODE for the population growth model. Specify the initial state, time sequence, and parameters. Store the result in a variable named solution.

    πŸ” Hint

    The ode function requires the initial state (y), the time sequence (times), the model function (func), and the parameters (parms), which should include the r and K parameters.

    πŸ”‘ Solution
    solution <- ode(y = y0, times = times, func = pop_growth, parms = list(r = r, K = K))
    

    Task 2.5: Visualizing the Population Growth

    Visualize the solution of the population growth model using R's plotting functions. Plot the population size over time to understand how the population evolves.

    πŸ” Hint

    Use the plot function with type = 'l' to create a line plot. The x-axis should represent time, and the y-axis should represent the population size.

    πŸ”‘ Solution
    plot(solution[,1], solution[,2], type = 'l', xlab='Time', ylab='Population')
    
  3. Challenge

    Solving the 8 Queens Problem with Local Search Optimization

    Solving the 8 Queens Problem with Local Search Optimization

    To review the concepts covered in this step, please refer to the Applying Mathematical Models in R module of the Understanding Statistical Models and Mathematical Models course.

    The 8 Queens problem is a classic example of a combinatorial optimization problem that can be solved using local search techniques. The idea is to position 8 Queens on an 8x8 chess board in such away that they cannot attack one another. This step will give you hands-on experience with optimization techniques, enhancing your problem-solving skills in mathematical modeling.

    Tackle the 8 Queens problem using local search optimization techniques in R. Start by setting up an initial state of the chessboard with 8 queens placed along the first rank. Implement and apply simulated annealing, stochastic local search, and threshold accepting algorithms to find a solution where no two queens attack each other. Use helper functions to generate candidate solutions and visualize the board's state during the optimization process. This practical exercise will help you understand the application of local search techniques in solving complex optimization problems.


    Task 3.1: Load Packages and Convenience Functions

    Load the NMOF R package, which comes pre-installed in this environment. Then, load the provided convenience functions into the environment.

    πŸ” Hint

    Use the library() function to load a package. Then, simply run the provided code to store the convenience functions in the environment.

    πŸ”‘ Solution
    # Load the package
    library(NMOF)
    
    # Load the provided convenience functions
    print_board <- function(position, q.char="1", sep = " ") {
        n <- length(position)
        row <- rep("*", n)
        for (i in seq_len(n)) {
             row_i <- row 
             row_i[position[i]] <- q.char
             cat(paste(row_i, collapse = sep))
             cat("\n")
        }   
    }
    neighbor <- function(position, board_size=8) {
        step <- 2
        i <- sample.int(board_size, 1)
        position[i] <- position[i] + sample(c(1:step, -(1:step)), 1)
        if (position[i] > board_size)
            position[i] <- 1
        else if (position[i] < 1)
            position[i] <- board_size
        return(position)
    }
    n_attacks <- function(position) {
        sum(duplicated(position)) +
        sum(duplicated(position - seq_along(position))) +
        sum(duplicated(position + seq_along(position)))
    }
    
    ---
    

    Task 3.2: Solve the N-Queens Problem using Simulated Annealing

    The provided code initializes the position of the queens along the first column. Use simulated annealing to solve the N-Queens problem on an 8x8 board.

    πŸ” Hint

    Use the SAOpt() function from the NMOF package for simulated annealing. Seek to minimize n_attacks and use the neighbor function as the neighbourhood function to return a changed solution at each step.

    πŸ”‘ Solution
    # Provided code to initialize a board
    pos0 <- rep(1, 8)
    
    # Solve the N-Queens problem
    solution1 <- SAopt(n_attacks, list(x0 = pos0,
                                      neighbour = neighbor,
                                      printBar = TRUE,
                                      nS = 1000))
    print_board(solution1$xbest)
    

    Task 3.3: Solve the N-Queens Problem using Stocastic Local Search

    Use Stocastic Local Search to solve the N-Queens problem on an 8x8 board.

    πŸ” Hint

    Use the LSopt() function from the NMOF package for simulated annealing. Seek to minimize n_attacks and use the neighbor function as the neighbourhood function to return a changed solution at each step.

    πŸ”‘ Solution
    solution2 <- LSopt(n_attacks, list(x0 = pos0,
                                      neighbour = neighbor,
                                      printBar = TRUE,
                                      nS = 1000))
    print_board(solution2$xbest)
    

    Task 3.4: Solve the N-Queens Problem using the Threshold Accepting Method

    Use Threshold Accepting to solve the N-Queens problem on an 8x8 board.

    πŸ” Hint

    Use the TAopt() function from the NMOF package for simulated annealing. Seek to minimize n_attacks and use the neighbor function as the neighbourhood function to return a changed solution at each step.

    πŸ”‘ Solution
    solution3 <- TAopt(n_attacks, list(x0 = pos0,
                                      neighbour = neighbor,
                                      printBar = TRUE,
                                      nS = 1000))
    print_board(solution3$xbest)
    
  4. Challenge

    Performing Hypothesis Testing in R

    Performing Hypothesis Testing in R

    To review the concepts covered in this step, please refer to the Applying Statistical Models in R module of the Understanding Statistical Models and Mathematical Models course.

    Hypothesis testing is a fundamental concept in statistical modeling, enabling you to make inferences about populations based on sample data. This step will provide practice in setting up and executing hypothesis tests, a critical skill in statistical analysis.

    Practice hypothesis testing using the HRIS.csv dataset. Use the t.test() function in R to perform a two-sample t-test to compare the average salaries of male and female employees. Interpret the results, focusing on the p-value and the test statistic, to determine if there is a significant difference in salaries. This exercise will reinforce your understanding of hypothesis testing and its application in analyzing real-world data.


    Task 4.1: Load the HRIS Dataset

    Begin by loading the HRIS.csv dataset into R. Read the file into R as a data frame named hris_data. Display the first few rows of the dataset.

    πŸ” Hint

    Use the read.csv() function with the file path as its argument to load the dataset. Then, use the head() function on your dataset variable to display its contents.

    πŸ”‘ Solution
    # Load the HRIS dataset
    hris_data <- read.csv('HRIS.csv')
    
    # Display the first few rows of the dataset
    head(hris_data)
    

    Task 4.2: Perform a Two-Sample t-Test

    Perform a two-sample t-test comparing the average salaries of male and female employees. Store the result in a variable named salary_test. Finally, print the result to interpret the p-value and the test statistic.

    πŸ” Hint

    Use the t.test() function. Subset by gender and select the salary column to create an x and y variable to pass to t.test().

    πŸ”‘ Solution
    # Perform a two-sample t-test comparing the average salaries of male and female employees
    male_salary <- hris_data[hris_data$gender == 'Male', "salary"]
    female_salary <- hris_data[hris_data$gender == 'Female', "salary"]
    salary_test <- t.test(male_salary, female_salary, data = hris_data)
    
    # Print the result
    print(salary_test)
    
    # ___ have a higher average salary, and the difference is ___
    # Answer: Males, not stastically significant
    

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.