Libraries: If you want this lab, consider one of these libraries.
Data

Building Statistical Summaries with R Hands-on Practice

In this lab, "Exploring Advanced Statistical Techniques in R," you will begin with a deep dive into hypothesis testing, setting up and interpreting one-sample t-tests and A/B testing scenarios. Then, you will explore the intricacies of ANOVA to compare means across multiple groups, gaining insights into significant differences in categorical data. Next, you will discover the power of regression analysis, building and evaluating both linear and logistic models to understand variable relationships. Finally, by the end of the lab, you will master Bayesian A/B Testing, applying probabilistic models to make informed decisions on webpage versioning. This comprehensive journey will equip you with advanced statistical analysis skills, enhancing your data-driven decision-making capabilities in R.

Get started Contact sales

Lab Info

Last updated

Jun 26, 2025

Duration

57m

Challenge

Exploring Hypothesis Testing with R
RStudio Guide

To get started, click on the 'workspace' folder in the bottom right pane of RStudio. Click on the file entitled "Step 1...". You may want to drag the console pane to be smaller so that you have more room to work. You'll complete each task for Step 1 in that R Markdown file. Remember, you must run the cells with the play button at the top right of each cell for a task before moving onto the next task in the R Markdown file. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.

Exploring Hypothesis Testing with R

To review the concepts covered in this step, please refer to the Understanding Statistical Summaries module of the Building Statistical Summaries with R course.

Hypothesis Testing is important because it allows us to make inferences about populations based on sample data. It's a foundational concept in statistics that supports decision-making.

Dive into the world of statistics by practicing hypothesis testing in R. You'll start by setting up null and alternative hypotheses for a given scenario. Use the dataset provided to calculate a one-sample t-test, interpreting the results to determine if you can reject the null hypothesis. Then conduct a t-test on A/B testing data. This step will utilize functions like t.test() and concepts like p-values and significance levels.

Task 1.1: Load the Dataset

Load the stats R package, which is already installed in this environment. Import A-B Testing.csv into R, assigning it to ab_data using read.csv().

🔍 Hint

Use library() to load a package. Invoke read.csv() with the file path as a string.
🔑 Solution

library(stats) ab_data <- read.csv('A-B Testing.csv')
Task 1.2: Calculate a One-Sample Test Statistic and P-value

A previous study suggested that the conversion rate for the control group (control.treatment = 0) should be equal to 0.6. Test whether this data significantly differs from that expectation using a one-sample t-test.

🔍 Hint

Filter the data to include only the control group. To perform a one-sample t-test, use the t.test function, setting y = NULL and setting mu to the theorized value.
🔑 Solution

control_group <- ab_data[ab_data$control.treatment == 0,] t.test(x = control_group$conversion, y = NULL, mu = 0.6) # Does not significantly differ
Task 1.3: Conduct a t-test on A/B Testing Data

Compare the control and treatment group conversion rates using t.test().

🔍 Hint

Segment the data by filtering on control.treatment, then apply t.test() on the conversion columns. Interpret the p-value relative to the 0.05 threshold.
🔑 Solution

control_group <- ab_data[ab_data$control.treatment == 0, ] treatment_group <- ab_data[ab_data$control.treatment == 1, ] t.test(control_group$conversion, treatment_group$conversion) # The groups significantly differ
Challenge

Performing ANOVA in R
Performing ANOVA in R

To review the concepts covered in this step, please refer to the Understanding Statistical Summaries module of the Building Statistical Summaries with R course.

ANOVA (Analysis of Variance) is important because it helps compare means across multiple groups, identifying whether any significant differences exist. It's crucial for analyzing categorical data with more than two levels.

Step into the role of a data analyst by conducting a one-way ANOVA test in R. Use the aov() function to compare means across different categories. Interpret the F-statistic from the ANOVA output to understand if there are any significant differences between the groups.

Task 2.1: Loading the Dataset

Load the A-B Testing.csv dataset into R using the read.csv() function. Assign the loaded data to a variable named ab_data.

🔍 Hint

Use the read.csv() function and provide the file path as a string argument.
🔑 Solution

ab_data <- read.csv('A-B Testing.csv')
Task 2.3: Conducting One-Way ANOVA

Conduct a one-way ANOVA test on relevant_data, comparing the conversion rates between the control and treatment groups. Assign the result to a variable named anova_result.

🔍 Hint

Use the aov() function with the formula conversion ~ control.treatment and the data argument set to relevant_data.
🔑 Solution

# Conduct one-way ANOVA anova_result <- aov(conversion ~ `control.treatment`, data = ab_data)
Task 2.4: Interpreting the ANOVA Result

Use the summary() function to display the ANOVA result stored in anova_result. Interpret the F-statistic and p-value to understand if there are significant differences between the groups.

🔍 Hint

Use the summary() function on anova_result and look for the F-statistic and p-value in the output.
🔑 Solution

# Print the summary summary(anova_result) # Are there significant differences between groups? # Yes!
Challenge

Linear and Logistic Regression in R
Linear and Logistic Regression in R

To review the concepts covered in this step, please refer to the Solving Problems Using Statistical Inference module of the Building Statistical Summaries with R course.

Regression Analysis is important because it allows for the prediction of a dependent variable based on one or more independent variables. It's a powerful tool for modeling and understanding relationships between variables.

Harness the power of regression analysis by building both a linear and a logistic regression model in R. For linear regression, use the lm() function to model a continuous outcome variable. For logistic regression, apply the glm() function with a binary outcome, interpreting the model's coefficients to understand the impact of each predictor. This step will also involve evaluating the models using R-squared and ROC curves, respectively.

Task 3.1: Loading and Exploring the Dataset

Begin by reading in the dataset called A-B Testing.csv into R and assign it to a data frame called ab_data. Print the first few rows of the dataset and produce a brief statistical summary of the variables.

🔍 Hint

Use the read.csv() function to load your dataset into a variable named ab_data. Then, apply the head() and summary() functions on dataset to explore its contents.
🔑 Solution

# Load the dataset ab_data <- read.csv('A-B Testing.csv') # View the first few rows of the dataset head(ab_data) # Get a statistical summary of the variables summary(ab_data)
Task 3.2: Building a Linear Regression Model

Construct a linear regression model to explore the relationship between group assignment (control or treatment) and conversion rates in the ab_data dataset. Print out the model summary.

🔍 Hint

To create your linear regression model, use lm(). Ensure you reference the conversion column as your outcome variable and control.treatment as your independent variable. Set the data argument to ab_data. The syntax for the lm() function follows this structure: lm(outcome_variable ~ independent_variable, data=your_dataframe).
🔑 Solution

# Build a linear regression model examining the effect of group assignment on conversion rates model_linear <- lm(conversion ~ control.treatment, data=ab_data) # View the model's details to understand the relationship summary(model_linear)
Task 3.3: Evaluating the Linear Regression Model with R-Squared

Evaluate the performance of your linear regression model by examining the R-squared value from the model's summary. The R-squared value indicates how well the independent variables explain the variance in the outcome variable.

🔍 Hint

Access the r.squared attribute from the summary of your linear regression model using the $ operator.
🔑 Solution

summary(model_linear)$r.squared
Task 3.4: Building a Logistic Regression Model

In previous steps, we've treated conversion as a continuous variable, when in fact it is binary. Employ logistic regression to examine the influence of group assignment (control or treatment) on the binary conversion outcome.

🔍 Hint

For the logistic regression model, set family=binomial in the glm() function to indicate you're modeling a binary outcome. The correct format for your model should look like this: glm(binary_outcome ~ independent_variable, family=binomial, data=your_dataframe).
🔑 Solution

model_logistic <- glm(conversion ~ control.treatment, family=binomial, data=ab_data) summary(model_logistic)
Task 3.5: Interpreting the Logistic Regression Model

Interpret the coefficients of your logistic regression model to understand the impact of the predictor on the binary outcome. Convert the coefficient to an odds ratios, making it easier to interpret.

🔍 Hint

Use the coef() function to extract the coefficients from your logistic regression model, then apply the exp() function to convert these coefficients to odds ratios. Finally, use the print() function to display the coefficients.
🔑 Solution

# Calculate the odds ratio coefficients <- coef(model_logistic) odds_ratios <- exp(coefficients) print(odds_ratios) # The odds of conversion in the treatment condition are ____ times as high as the odds of conversion in the control condition # ~0.4
Challenge

Implementing Bayesian A/B Testing in R
Implementing Bayesian A/B Testing in R

To review the concepts covered in this step, please refer to the Implementing Bayesian A/B Testing module of the Building Statistical Summaries with R course.

Bayesian A/B Testing is important because it provides a probabilistic approach to comparing two or more versions of a webpage or app, allowing for more nuanced decision-making based on posterior probabilities.

Explore Bayesian statistics by performing a Bayesian A/B test using the provided dataset. Calibrate a beta distribution to model prior information and use the bayesAB package to compare two versions of a webpage based on conversion rates. Interpret the posterior probabilities to determine which version performs better.

Task 4.1: Loading the bayesAB Package

Begin by loading the bayesAB package which will be used for performing Bayesian A/B testing.

🔍 Hint

Use the library function to load a package. The name of the package you need to load is bayesAB.
🔑 Solution

library(bayesAB)
Task 4.2: Reading and Inspecting the Dataset

Read the provided dataset A-B Testing.csv into R and assign it to a data frame called ab_data. Inspect the first few rows to understand its structure.

🔍 Hint

Use the read.csv function to load the dataset. Then, use the head function to display the first few rows of the data frame.
🔑 Solution

ab_data <- read.csv('A-B Testing.csv') head(ab_data)
Task 4.3: Calibrating Prior Information

Create a vector of two named numbers for a beta distribution to model the prior information about conversion. Assign the vector to a variable called priors. Based on previous studies, we'll assume that the first parameter of the beta distribution (named alpha) is 6, and the second parameter of the beta distribution (named beta) is 4, reflecting a weak prior probability of 0.6.

🔍 Hint

Create a vector of named numbers with c(). You need to specify the alpha and beta parameters.
🔑 Solution

priors <- c('alpha' = 6, 'beta' = 4)
Task 4.4: Performing Bayesian A/B Testing

Use the bayesTest function from the bayesAB package to perform Bayesian A/B testing on the control and treatment groups.

🔍 Hint

Use the bayesTest function with the control and treatment group data as inputs. You also need to specify the type of distribution, which in this case is 'bernoulli', and the priors you set up in the previous step.
🔑 Solution

control_group <- ab_data[ab_data$control.treatment == 0,] treatment_group <- ab_data[ab_data$control.treatment == 1,] ab_test_result <- bayesTest(A_data = control_group$conversion, B_data = treatment_group$conversion, priors = priors, distribution = 'bernoulli') summary(ab_test_result)
Task 4.5: Interpreting the Results

Interpret the results of the Bayesian A/B test by examining the posterior probabilities and credible interval. Plot a histogram of the posterior probabilities of the control group, and a histogram of the posterior probabilities of the treatment group. Then print out a summary of the test. Determine whether conversion is more or less likely in the treatment condition.

🔍 Hint

Use the hist function to plot the posterior probabilities. Use the summary function on the result of the Bayesian A/B test to view summary information about the posterior probabilities and credible intervals of the differences between A and B. Look for the version with the higher posterior probability of having a higher conversion rate.
🔑 Solution

hist(ab_test_result$posteriors$Probability$A) hist(ab_test_result$posteriors$Probability$B) summary(ab_test_result) # Conversion is ___ likely under the treatment condition # Answer: less

About the author

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Building Statistical Summaries with R Hands-on Practice

Lab Info

Table of Contents

Exploring Hypothesis Testing with R

RStudio Guide

Exploring Hypothesis Testing with R

Task 1.1: Load the Dataset

Task 1.2: Calculate a One-Sample Test Statistic and P-value

Task 1.3: Conduct a t-test on A/B Testing Data

Performing ANOVA in R

Performing ANOVA in R

Task 2.1: Loading the Dataset

Task 2.3: Conducting One-Way ANOVA

Task 2.4: Interpreting the ANOVA Result

Linear and Logistic Regression in R

Linear and Logistic Regression in R

Task 3.1: Loading and Exploring the Dataset

Task 3.2: Building a Linear Regression Model

Task 3.3: Evaluating the Linear Regression Model with R-Squared

Task 3.4: Building a Logistic Regression Model

Task 3.5: Interpreting the Logistic Regression Model

Implementing Bayesian A/B Testing in R

Implementing Bayesian A/B Testing in R

Task 4.1: Loading the bayesAB Package

Task 4.2: Reading and Inspecting the Dataset

Task 4.3: Calibrating Prior Information

Task 4.4: Performing Bayesian A/B Testing

Task 4.5: Interpreting the Results

About the author

Real skill practice before real-world application

Learn by doing

Follow your guide

Turn time into mastery

Get started with Pluralsight