• Labs icon Lab
  • Data
Labs

Perform Basic Statistical Tests in R

This lab introduces you to the fundamentals of data analysis using the R programming language. Designed specifically for working with data, R allows you to quickly perform calculations, run analyses, and generate visualizations from datasets. With these powerful tools at your fingertips, you'll be able to work more efficiently and produce more accurate results.

Labs

Path Info

Level
Clock icon Beginner
Duration
Clock icon 28m
Published
Clock icon Mar 28, 2025

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Challenge 1

    Introduction

    In this code lab you will learn how to summarize and analyze data using the R programming language. R is a programming language focused around data analysis, so it makes it very easy to perform most data analysis functions. To begin you will learn how to import excel files into an R program and run some basic grouping and analysis on the different columns. You should see the last row of the excel sheet has some NA values, this simply means that the entries for this person are blank. This happens in datasets from time to time and it can cause an issue when it comes to data analysis, so you will learn how to deal with NA values later on in this lab.

    Grouping Data

    Another thing you may want to do with datasets is group them according to certain characteristics. In this case you can see the main characteristics of the participants are gender and age, so here you will learn how to group the entries by those characteristics. Note: Since you have NA values in the worksheet, it’s important that you add the flag “na.rm=TRUE”otherwise it will give you an overall output of NA, which is useless from an analysis point of view.

    Also, if you would like to get the mean of a different score, you can take this code and simply replace math_score with whatever subject you are interested in for either gender or age.

  2. Challenge

    Challenge 2

    In this challenge you learn how to perform a t-test, which is a test designed to see if the difference in means between two groups is statistically significant or not. For example, we will be checking if the difference in the average english_score between males and females is statistically significant or not. Simply put, it’s testing if one group is truly better at the subject of English or is the difference in their means more a matter of luck/chance. Now there's two ways to identify the overall result of the test. First, R should output a line that states true difference in means between group Female and group Male is not equal to 0. This is confirmation that the test has confirmed that there is a significant difference in math test scores between males and females.

    Second, you can look at the p value, which in this case is -2.2e-16 or 0.00000000000000022. Since this value is less than 0.05 (the number for a 95% significance level), this is mathematical proof that the difference between the means of the two groups is significant.

  3. Challenge

    Challenge 3

    In this challenge you learn how to use the ANOVA. The Analysis of variance (ANOVA) is a statistical test used to assess the difference between the means of more than two groups, in this case you will be using it to compare the values of scores between the three different age groups. This code will test the effect of age and gender on science score. Based on the results you can see both age and gender has a significant impact on science score denoted by the very small P-value and the three asterisks next to them. However, in the final row you can see age and gender is not considered statistically significant as it is right at the limit of 0.05 as the p value is 0.0492 and therefore it only has two asterisks next to it.

    Based on this test it suggests that both age and gender has an impact on the student’s science score.

    Note: There is a line at the bottom stating that 1 observation deleted due to missingness this is the row that had the NA entries.

  4. Challenge

    Challenge 4

    In this final challenge you learn how to calculate correlation coefficients in R. Correlation Coefficients simply show the relationship between two variables, so as one variable increases in value does the other variable also increase or does it tend to decrease. In this challenge you will learn how to calculate this in R. This code will perform the test between age and math score. The flag use = “complete.obs” is important to tell the command to use only complete observations and to omit all entries that have NA values.

    Execute the code and you should get the following as an output -0.4060934. This suggests that there is a negative correlation, meaning as one variable increases in value, the other variable tends to decrease in value.

    Correlation Matrices

    In addition to computing individual correlation coefficients, you can create a correlation matrix, which will show the correlation coefficient of every single variable in relation to another in a spreadsheet. However, to do this using the method below it’s important to note that every single column in the spreadsheet must be numeric (contain only numbers) or it will throw an error. Therefore, you will be using a slightly modified data sheet that has the non-numeric columns removed. Here you will see all of the correlation coefficients of all variables in relation to one another, providing a great overall snapshot of how the variables relate to one another.

    With that you have completed this Code Lab. You have gained the ability to work with excel data sheets in R and perform basic calculations such as calculating the mean of groups, reading and organizing data, performing T-Tests, Anova Tests and calculating the correlation coefficient between different variables in the dataset.

Shimon Brathwaite is a seven-year cybersecurity professional with extensive experience in Incident Response, Vulnerability Management, Identity and Access Management and Consulting.

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.