- Lab
- Data

Building Statistical Summaries with R Hands-on Practice
In this lab, "Exploring Advanced Statistical Techniques in R," you will begin with a deep dive into hypothesis testing, setting up and interpreting one-sample t-tests and A/B testing scenarios. Then, you will explore the intricacies of ANOVA to compare means across multiple groups, gaining insights into significant differences in categorical data. Next, you will discover the power of regression analysis, building and evaluating both linear and logistic models to understand variable relationships. Finally, by the end of the lab, you will master Bayesian A/B Testing, applying probabilistic models to make informed decisions on webpage versioning. This comprehensive journey will equip you with advanced statistical analysis skills, enhancing your data-driven decision-making capabilities in R.

Path Info
Table of Contents
-
Challenge
Exploring Hypothesis Testing with R
RStudio Guide
To get started, click on the 'workspace' folder in the bottom right pane of RStudio. Click on the file entitled "Step 1...". You may want to drag the console pane to be smaller so that you have more room to work. You'll complete each task for Step 1 in that R Markdown file. Remember, you must run the cells with the play button at the top right of each cell for a task before moving onto the next task in the R Markdown file. Continue until you have completed all tasks in this step. Then when you are ready to move onto the next step, you'll come back and click on the file for the next step until you have completed all tasks in all steps of the lab.
Exploring Hypothesis Testing with R
To review the concepts covered in this step, please refer to the Understanding Statistical Summaries module of the Building Statistical Summaries with R course.
Hypothesis Testing is important because it allows us to make inferences about populations based on sample data. It's a foundational concept in statistics that supports decision-making.
Dive into the world of statistics by practicing hypothesis testing in R. You'll start by setting up null and alternative hypotheses for a given scenario. Use the dataset provided to calculate a one-sample t-test, interpreting the results to determine if you can reject the null hypothesis. Then conduct a t-test on A/B testing data. This step will utilize functions like
t.test()
and concepts like p-values and significance levels.
Task 1.1: Load the Dataset
Load the
stats
R package, which is already installed in this environment. ImportA-B Testing.csv
into R, assigning it toab_data
usingread.csv()
.π Hint
Use
library()
to load a package. Invokeread.csv()
with the file path as a string.π Solution
library(stats) ab_data <- read.csv('A-B Testing.csv')
Task 1.2: Calculate a One-Sample Test Statistic and P-value
A previous study suggested that the conversion rate for the control group (
control.treatment
= 0) should be equal to 0.6. Test whether this data significantly differs from that expectation using a one-sample t-test.π Hint
Filter the data to include only the control group. To perform a one-sample t-test, use the
t.test
function, settingy = NULL
and settingmu
to the theorized value.π Solution
control_group <- ab_data[ab_data$control.treatment == 0,] t.test(x = control_group$conversion, y = NULL, mu = 0.6) # Does not significantly differ
Task 1.3: Conduct a t-test on A/B Testing Data
Compare the control and treatment group conversion rates using
t.test()
.π Hint
Segment the data by filtering on
control.treatment
, then applyt.test()
on theconversion
columns. Interpret the p-value relative to the 0.05 threshold.π Solution
control_group <- ab_data[ab_data$control.treatment == 0, ] treatment_group <- ab_data[ab_data$control.treatment == 1, ] t.test(control_group$conversion, treatment_group$conversion) # The groups significantly differ
-
Challenge
Performing ANOVA in R
Performing ANOVA in R
To review the concepts covered in this step, please refer to the Understanding Statistical Summaries module of the Building Statistical Summaries with R course.
ANOVA (Analysis of Variance) is important because it helps compare means across multiple groups, identifying whether any significant differences exist. It's crucial for analyzing categorical data with more than two levels.
Step into the role of a data analyst by conducting a one-way ANOVA test in R. Use the
aov()
function to compare means across different categories. Interpret the F-statistic from the ANOVA output to understand if there are any significant differences between the groups.
Task 2.1: Loading the Dataset
Load the
A-B Testing.csv
dataset into R using theread.csv()
function. Assign the loaded data to a variable namedab_data
.π Hint
Use the
read.csv()
function and provide the file path as a string argument.π Solution
ab_data <- read.csv('A-B Testing.csv')
Task 2.3: Conducting One-Way ANOVA
Conduct a one-way ANOVA test on
relevant_data
, comparing theconversion
rates between the control and treatment groups. Assign the result to a variable namedanova_result
.π Hint
Use the
aov()
function with the formulaconversion ~ control.treatment
and the data argument set torelevant_data
.π Solution
# Conduct one-way ANOVA anova_result <- aov(conversion ~ `control.treatment`, data = ab_data)
Task 2.4: Interpreting the ANOVA Result
Use the
summary()
function to display the ANOVA result stored inanova_result
. Interpret the F-statistic and p-value to understand if there are significant differences between the groups.π Hint
Use the
summary()
function onanova_result
and look for the F-statistic and p-value in the output.π Solution
# Print the summary summary(anova_result) # Are there significant differences between groups? # Yes!
-
Challenge
Linear and Logistic Regression in R
Linear and Logistic Regression in R
To review the concepts covered in this step, please refer to the Solving Problems Using Statistical Inference module of the Building Statistical Summaries with R course.
Regression Analysis is important because it allows for the prediction of a dependent variable based on one or more independent variables. It's a powerful tool for modeling and understanding relationships between variables.
Harness the power of regression analysis by building both a linear and a logistic regression model in R. For linear regression, use the
lm()
function to model a continuous outcome variable. For logistic regression, apply theglm()
function with a binary outcome, interpreting the model's coefficients to understand the impact of each predictor. This step will also involve evaluating the models using R-squared and ROC curves, respectively.
Task 3.1: Loading and Exploring the Dataset
Begin by reading in the dataset called
A-B Testing.csv
into R and assign it to a data frame calledab_data
. Print the first few rows of the dataset and produce a brief statistical summary of the variables.π Hint
Use the
read.csv()
function to load your dataset into a variable namedab_data
. Then, apply thehead()
andsummary()
functions ondataset
to explore its contents.π Solution
# Load the dataset ab_data <- read.csv('A-B Testing.csv') # View the first few rows of the dataset head(ab_data) # Get a statistical summary of the variables summary(ab_data)
Task 3.2: Building a Linear Regression Model
Construct a linear regression model to explore the relationship between group assignment (control or treatment) and conversion rates in the
ab_data
dataset. Print out the model summary.π Hint
To create your linear regression model, use
lm()
. Ensure you reference theconversion
column as your outcome variable andcontrol.treatment
as your independent variable. Set the data argument toab_data
. The syntax for thelm()
function follows this structure:lm(outcome_variable ~ independent_variable, data=your_dataframe)
.π Solution
# Build a linear regression model examining the effect of group assignment on conversion rates model_linear <- lm(conversion ~ control.treatment, data=ab_data) # View the model's details to understand the relationship summary(model_linear)
Task 3.3: Evaluating the Linear Regression Model with R-Squared
Evaluate the performance of your linear regression model by examining the R-squared value from the model's summary. The R-squared value indicates how well the independent variables explain the variance in the outcome variable.
π Hint
Access the
r.squared
attribute from the summary of your linear regression model using the$
operator.π Solution
summary(model_linear)$r.squared
Task 3.4: Building a Logistic Regression Model
In previous steps, we've treated
conversion
as a continuous variable, when in fact it is binary. Employ logistic regression to examine the influence of group assignment (control or treatment) on the binaryconversion
outcome.π Hint
For the logistic regression model, set
family=binomial
in theglm()
function to indicate you're modeling a binary outcome. The correct format for your model should look like this:glm(binary_outcome ~ independent_variable, family=binomial, data=your_dataframe)
.π Solution
model_logistic <- glm(conversion ~ control.treatment, family=binomial, data=ab_data) summary(model_logistic)
Task 3.5: Interpreting the Logistic Regression Model
Interpret the coefficients of your logistic regression model to understand the impact of the predictor on the binary outcome. Convert the coefficient to an odds ratios, making it easier to interpret.
π Hint
Use the
coef()
function to extract the coefficients from your logistic regression model, then apply theexp()
function to convert these coefficients to odds ratios. Finally, use theprint()
function to display the coefficients.π Solution
# Calculate the odds ratio coefficients <- coef(model_logistic) odds_ratios <- exp(coefficients) print(odds_ratios) # The odds of conversion in the treatment condition are ____ times as high as the odds of conversion in the control condition # ~0.4
-
Challenge
Implementing Bayesian A/B Testing in R
Implementing Bayesian A/B Testing in R
To review the concepts covered in this step, please refer to the Implementing Bayesian A/B Testing module of the Building Statistical Summaries with R course.
Bayesian A/B Testing is important because it provides a probabilistic approach to comparing two or more versions of a webpage or app, allowing for more nuanced decision-making based on posterior probabilities.
Explore Bayesian statistics by performing a Bayesian A/B test using the provided dataset. Calibrate a beta distribution to model prior information and use the
bayesAB
package to compare two versions of a webpage based on conversion rates. Interpret the posterior probabilities to determine which version performs better.
Task 4.1: Loading the bayesAB Package
Begin by loading the
bayesAB
package which will be used for performing Bayesian A/B testing.π Hint
Use the
library
function to load a package. The name of the package you need to load isbayesAB
.π Solution
library(bayesAB)
Task 4.2: Reading and Inspecting the Dataset
Read the provided dataset
A-B Testing.csv
into R and assign it to a data frame calledab_data
. Inspect the first few rows to understand its structure.π Hint
Use the
read.csv
function to load the dataset. Then, use thehead
function to display the first few rows of the data frame.π Solution
ab_data <- read.csv('A-B Testing.csv') head(ab_data)
Task 4.3: Calibrating Prior Information
Create a vector of two named numbers for a beta distribution to model the prior information about conversion. Assign the vector to a variable called
priors
. Based on previous studies, we'll assume that the first parameter of the beta distribution (namedalpha
) is 6, and the second parameter of the beta distribution (namedbeta
) is 4, reflecting a weak prior probability of 0.6.π Hint
Create a vector of named numbers with
c()
. You need to specify thealpha
andbeta
parameters.π Solution
priors <- c('alpha' = 6, 'beta' = 4)
Task 4.4: Performing Bayesian A/B Testing
Use the
bayesTest
function from thebayesAB
package to perform Bayesian A/B testing on the control and treatment groups.π Hint
Use the
bayesTest
function with the control and treatment group data as inputs. You also need to specify the type of distribution, which in this case is'bernoulli'
, and the priors you set up in the previous step.π Solution
control_group <- ab_data[ab_data$control.treatment == 0,] treatment_group <- ab_data[ab_data$control.treatment == 1,] ab_test_result <- bayesTest(A_data = control_group$conversion, B_data = treatment_group$conversion, priors = priors, distribution = 'bernoulli') summary(ab_test_result)
Task 4.5: Interpreting the Results
Interpret the results of the Bayesian A/B test by examining the posterior probabilities and credible interval. Plot a histogram of the posterior probabilities of the control group, and a histogram of the posterior probabilities of the treatment group. Then print out a summary of the test. Determine whether conversion is more or less likely in the treatment condition.
π Hint
Use the
hist
function to plot the posterior probabilities. Use thesummary
function on the result of the Bayesian A/B test to view summary information about the posterior probabilities and credible intervals of the differences between A and B. Look for the version with the higher posterior probability of having a higher conversion rate.π Solution
hist(ab_test_result$posteriors$Probability$A) hist(ab_test_result$posteriors$Probability$B) summary(ab_test_result) # Conversion is ___ likely under the treatment condition # Answer: less
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the authorβs guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.