Libraries: If you want this lab, consider one of these libraries.
Data

Understanding Statistical Models and Mathematical Models Hands-on Practice

In this lab, you will learn to apply mathematical and statistical models using R, starting with basic concepts and progressing to real-world applications. You will then model phenomena like population growth and financial risk, solve complex problems such as the 8-queens puzzle, and perform hypothesis testing using T-tests and Z-tests.

Get started Contact sales

Lab Info

Last updated

Nov 22, 2024

Duration

1h 0m

Challenge

Exploring Data and Metadata in R
RStudio Guide

To get started, click on the 'workspace' folder in the bottom right pane of RStudio. Click on the file entitled "Step 1...". You may want to drag the console pane to be smaller so that you have more room to work. You'll complete each task for Step 1 in that R Markdown file. Remember, you must run the cells with the play button at the top right of each cell for a task before moving onto the next task in the R Markdown file. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.

Exploring Data and Metadata in R

To review the concepts covered in this step, please refer to the Understanding Statistical and Mathematical Models module of the Understanding Statistical Models and Mathematical Models course.

Understanding the structure and context of your data is crucial for effective data analysis and model selection. This step will help you practice interpreting data with the context provided by metadata, a foundational skill in both statistical and mathematical modeling.

Dive into the world of data analysis by exploring the HRIS.csv dataset available in the lab environment. Your goal is to understand the structure of the dataset and the context provided by its metadata. Use the str() function to examine the structure of the dataset, highlighting the types of data and any potential challenges they may present. Then, leverage the comment() function to add and query metadata for the dataset, providing context that could influence your analysis and model selection. This hands-on experience will solidify your understanding of how metadata can impact data interpretation and model applicability.

Task 1.1: Loading and Exploring the Dataset

Begin your journey into data analysis by loading the HRIS.csv dataset into R. Use the read.csv() function to load the dataset and assign it to a variable. Then, use the str() function to examine the structure of the dataset, focusing on the types of data it contains and any potential challenges these data types may present for analysis.

🔍 Hint

Use read.csv('HRIS.csv') to load the dataset into a variable named hris_data. Then, call str(hris_data) to display its structure.
🔑 Solution

# Load the HRIS.csv dataset hris_data <- read.csv('HRIS.csv') # Examine the structure of the dataset str(hris_data)
Task 1.2: Adding Metadata to the Dataset

Now that you have a basic understanding of the dataset's structure, it's time to add context to it by adding metadata. Use the comment() function to add a brief description to the hris_data dataset. This description should provide context about what the dataset represents and any relevant information that could influence analysis or model selection.

🔍 Hint

Use comment(hris_data) <- 'your description here' to add a description to the dataset.
🔑 Solution

comment(hris_data) <- 'This dataset contains HR information including employee details, positions, and salaries.'
Task 1.3: Querying Metadata

With metadata added to your dataset, explore how you can query this information to better understand the context of your data. Use the comment() function to retrieve the description you added to the hris_data dataset.

🔍 Hint

To retrieve the description you added to the dataset, simply use comment(hris_data).
🔑 Solution

my_comment <- comment(hris_data) print(my_comment)
Challenge

Modeling Population Growth with ODEs
Modeling Population Growth with ODEs

To review the concepts covered in this step, please refer to the Case Studies on Statistical and Mathematical Models module of the Understanding Statistical Models and Mathematical Models course.

Ordinary Differential Equations (ODEs) are a powerful tool for modeling deterministic systems, such as population growth. This step will enhance your ability to apply mathematical models to real-world problems, focusing on the deterministic nature of such models.

In this task, you'll apply your knowledge of Ordinary Differential Equations (ODEs) to model population growth. Using the deSolve package in R, set up and solve an ODE that models population growth based on the Verhulst's Decreasing Growth Model. You'll need to define the initial population size, the rate of growth, and the carrying capacity of the environment. Visualize the solution using R's plotting functions to understand how the population evolves over time. This exercise will deepen your understanding of how mathematical models can be used to predict deterministic systems.

Task 2.1: Loading the deSolve Package

Before we can start modeling population growth with ODEs, we need to load the deSolve package which provides functions to solve initial value problems for differential equations. This package is essential for our task.

🔍 Hint

Use the library function to load the deSolve package.
🔑 Solution

library('deSolve')
Task 2.2: Defining the Population Growth Model

Define a function pop_growth that represents the Verhulst's Decreasing Growth Model. This function will take three parameters: time t, state y, and parameters parms. The parms argument in this case will include a list specifying the rate of growth r and the carrying capacity K. The function should return the rate of change of the population size.

🔍 Hint

The rate of change of the population size can be calculated using the formula dY = r * y * (1 - y / K), where r is the rate of growth, y is the current population size, and K is the carrying capacity. When retrieving items from the parms list, make sure to use the double bracket syntax [[]].
🔑 Solution

pop_growth <- function(t, y, parms) { r <- parms[['r']] K <- parms[['K']] dY <- r * y * (1 - y / K) list(c(dY)) }
Task 2.3: Setting Initial Conditions and Parameters

Create variables representing the initial conditions and parameters for the population growth model. Start with an initial population size of 100, a growth rate of 0.1, and a carrying capacity of 1000. Define a sequence of times from 1 to 100.

🔍 Hint

Assign the specified numbers to variables with appropriate names corresponding to the formula. Use the seq() function to create a sequence of numbers.
🔑 Solution

# Initial population size y0 <- 100 # Rate of growth r <- 0.1 # Carrying capacity K <- 1000 # Sequence of times times <- seq(1, 100, by = 1)
Task 2.4: Solving the ODE

Use the ode function from the deSolve package to solve the ODE for the population growth model. Specify the initial state, time sequence, and parameters. Store the result in a variable named solution.

🔍 Hint

The ode function requires the initial state (y), the time sequence (times), the model function (func), and the parameters (parms), which should include the r and K parameters.
🔑 Solution

solution <- ode(y = y0, times = times, func = pop_growth, parms = list(r = r, K = K))
Task 2.5: Visualizing the Population Growth

Visualize the solution of the population growth model using R's plotting functions. Plot the population size over time to understand how the population evolves.

🔍 Hint

Use the plot function with type = 'l' to create a line plot. The x-axis should represent time, and the y-axis should represent the population size.
🔑 Solution

plot(solution[,1], solution[,2], type = 'l', xlab='Time', ylab='Population')
Challenge

Solving the 8 Queens Problem with Local Search Optimization
Solving the 8 Queens Problem with Local Search Optimization

To review the concepts covered in this step, please refer to the Applying Mathematical Models in R module of the Understanding Statistical Models and Mathematical Models course.

The 8 Queens problem is a classic example of a combinatorial optimization problem that can be solved using local search techniques. The idea is to position 8 Queens on an 8x8 chess board in such away that they cannot attack one another. This step will give you hands-on experience with optimization techniques, enhancing your problem-solving skills in mathematical modeling.

Tackle the 8 Queens problem using local search optimization techniques in R. Start by setting up an initial state of the chessboard with 8 queens placed along the first rank. Implement and apply simulated annealing, stochastic local search, and threshold accepting algorithms to find a solution where no two queens attack each other. Use helper functions to generate candidate solutions and visualize the board's state during the optimization process. This practical exercise will help you understand the application of local search techniques in solving complex optimization problems.

Task 3.1: Load Packages and Convenience Functions

Load the NMOF R package, which comes pre-installed in this environment. Then, load the provided convenience functions into the environment.

🔍 Hint

Use the library() function to load a package. Then, simply run the provided code to store the convenience functions in the environment.
🔑 Solution

# Load the package library(NMOF) # Load the provided convenience functions print_board <- function(position, q.char="1", sep = " ") { n <- length(position) row <- rep("*", n) for (i in seq_len(n)) { row_i <- row row_i[position[i]] <- q.char cat(paste(row_i, collapse = sep)) cat("\n") } } neighbor <- function(position, board_size=8) { step <- 2 i <- sample.int(board_size, 1) position[i] <- position[i] + sample(c(1:step, -(1:step)), 1) if (position[i] > board_size) position[i] <- 1 else if (position[i] < 1) position[i] <- board_size return(position) } n_attacks <- function(position) { sum(duplicated(position)) + sum(duplicated(position - seq_along(position))) + sum(duplicated(position + seq_along(position))) }
```
---
```
Task 3.2: Solve the N-Queens Problem using Simulated Annealing

The provided code initializes the position of the queens along the first column. Use simulated annealing to solve the N-Queens problem on an 8x8 board.

🔍 Hint

Use the SAOpt() function from the NMOF package for simulated annealing. Seek to minimize n_attacks and use the neighbor function as the neighbourhood function to return a changed solution at each step.
🔑 Solution

# Provided code to initialize a board pos0 <- rep(1, 8) # Solve the N-Queens problem solution1 <- SAopt(n_attacks, list(x0 = pos0, neighbour = neighbor, printBar = TRUE, nS = 1000)) print_board(solution1$xbest)
Task 3.3: Solve the N-Queens Problem using Stocastic Local Search

Use Stocastic Local Search to solve the N-Queens problem on an 8x8 board.

🔍 Hint

Use the LSopt() function from the NMOF package for simulated annealing. Seek to minimize n_attacks and use the neighbor function as the neighbourhood function to return a changed solution at each step.
🔑 Solution

solution2 <- LSopt(n_attacks, list(x0 = pos0, neighbour = neighbor, printBar = TRUE, nS = 1000)) print_board(solution2$xbest)
Task 3.4: Solve the N-Queens Problem using the Threshold Accepting Method

Use Threshold Accepting to solve the N-Queens problem on an 8x8 board.

🔍 Hint

Use the TAopt() function from the NMOF package for simulated annealing. Seek to minimize n_attacks and use the neighbor function as the neighbourhood function to return a changed solution at each step.
🔑 Solution

solution3 <- TAopt(n_attacks, list(x0 = pos0, neighbour = neighbor, printBar = TRUE, nS = 1000)) print_board(solution3$xbest)
Challenge

Performing Hypothesis Testing in R
Performing Hypothesis Testing in R

To review the concepts covered in this step, please refer to the Applying Statistical Models in R module of the Understanding Statistical Models and Mathematical Models course.

Hypothesis testing is a fundamental concept in statistical modeling, enabling you to make inferences about populations based on sample data. This step will provide practice in setting up and executing hypothesis tests, a critical skill in statistical analysis.

Practice hypothesis testing using the HRIS.csv dataset. Use the t.test() function in R to perform a two-sample t-test to compare the average salaries of male and female employees. Interpret the results, focusing on the p-value and the test statistic, to determine if there is a significant difference in salaries. This exercise will reinforce your understanding of hypothesis testing and its application in analyzing real-world data.

Task 4.1: Load the HRIS Dataset

Begin by loading the HRIS.csv dataset into R. Read the file into R as a data frame named hris_data. Display the first few rows of the dataset.

🔍 Hint

Use the read.csv() function with the file path as its argument to load the dataset. Then, use the head() function on your dataset variable to display its contents.
🔑 Solution

# Load the HRIS dataset hris_data <- read.csv('HRIS.csv') # Display the first few rows of the dataset head(hris_data)
Task 4.2: Perform a Two-Sample t-Test

Perform a two-sample t-test comparing the average salaries of male and female employees. Store the result in a variable named salary_test. Finally, print the result to interpret the p-value and the test statistic.

🔍 Hint

Use the t.test() function. Subset by gender and select the salary column to create an x and y variable to pass to t.test().
🔑 Solution

# Perform a two-sample t-test comparing the average salaries of male and female employees male_salary <- hris_data[hris_data$gender == 'Male', "salary"] female_salary <- hris_data[hris_data$gender == 'Female', "salary"] salary_test <- t.test(male_salary, female_salary, data = hris_data) # Print the result print(salary_test) # ___ have a higher average salary, and the difference is ___ # Answer: Males, not stastically significant

About the author

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Understanding Statistical Models and Mathematical Models Hands-on Practice

Lab Info

Table of Contents

Exploring Data and Metadata in R

RStudio Guide

Exploring Data and Metadata in R

Task 1.1: Loading and Exploring the Dataset

Task 1.2: Adding Metadata to the Dataset

Task 1.3: Querying Metadata

Modeling Population Growth with ODEs

Modeling Population Growth with ODEs

Task 2.1: Loading the deSolve Package

Task 2.2: Defining the Population Growth Model

Task 2.3: Setting Initial Conditions and Parameters

Task 2.4: Solving the ODE

Task 2.5: Visualizing the Population Growth

Solving the 8 Queens Problem with Local Search Optimization

Solving the 8 Queens Problem with Local Search Optimization

Task 3.1: Load Packages and Convenience Functions

Task 3.2: Solve the N-Queens Problem using Simulated Annealing

Task 3.3: Solve the N-Queens Problem using Stocastic Local Search

Task 3.4: Solve the N-Queens Problem using the Threshold Accepting Method

Performing Hypothesis Testing in R

Performing Hypothesis Testing in R

Task 4.1: Load the HRIS Dataset

Task 4.2: Perform a Two-Sample t-Test

About the author

Real skill practice before real-world application

Learn by doing

Follow your guide

Turn time into mastery

Get started with Pluralsight