Introduction

13

In this guide, you will learn about various statistical functions that are available in Excel 2019. By the end of this guide, you'll be familiar with some important statistical functions with examples. Later, you can explore the rest of the functions using the Excel provided documentation.

The functions which you are going to learn in this guide are:

- AVERAGE
- AVERAGEIF
- MEDIAN
- PERMUT
- COUNT
- COUNTIF
- FREQUENCY
- STDEV, and VAR
- PERCENTILE.INC, QUARTILE.INC, and RANK.EQ

To start with, you have the most commonly used function, the AVERAGE. It gives you an average of `n`

numbers by summing them up and dividing by the sum of the total numbers. Many beginners believe that they can only get the average of two numbers but this is not the case. Here's the syntax for the function:

`1`

`=AVERAGE(n1, n2, n3 ... nn)`

To elaborate this function with an example, consider the given tabular data which consists of three students' marks data. You need to find the average marks:

A | B | C | D |
---|---|---|---|

Subject 1 | Subject 2 | Subject 3 | Result |

26 | 38 | 78 | ? |

65 | 78 | 11 | ? |

12 | 88 | 50 | ? |

To find the average marks for each student, you need to deploy the given formula in **D1** and then stretch it all the way down for rest of the two, `=AVERAGE(A1, B1, C1)`

. This results in the following answer:

A | B | C | D |
---|---|---|---|

Subject 1 | Subject 2 | Subject 3 | Result |

26 | 38 | 78 | 47.33 |

65 | 78 | 11 | 51.33 |

12 | 88 | 50 | 50 |

Now, in the above example, consider that you only need to average those subject marks where the student has scored more than 50. Such a result can't be calculated using the `AVERAGE`

function, therefore, you need a new function, the `AVERAGEIF`

function. The syntax is given below:

`1`

`=AVERAGEIF(cell_range, condition, [opt_avg_range])`

Form a tabular data for a student whose marks, in four out of five subjects, are known. You need to find the average marks, considering only those subjects where she has scored more than 50.

A | B |
---|---|

Subjects | Marks |

Subject 1 | 26 |

Subject 2 | 65 |

Subject 3 | 12 |

Subject 4 | |

Subject 5 | 78 |

Now, in any new cell, you can write the given formula to arrive at the result:

`1`

`=AVERAGEIF(B1:B5, ">50")`

This gives you the result as **71.5** by avoiding the marks for the subjects 1, 3, and 4.

At the back-end, the `AVERAGE`

function uses the `mean`

concept in calculation. This has a disadvantage if their is an anomaly in the data. For example, consider a case where three doctors report their number of patients analyzed in five consecutive days, as shown:

A | B | C | D | E |
---|---|---|---|---|

Day 1 | Day 2 | Day 3 | Day 4 | Day 5 |

8 | 12 | 16 | 10 | 87 |

12 | 15 | 10 | 11 | 9 |

8 | 4 | 1 | 0 | 2 |

If you try to average out the number of patients then the anomaly (87) for the first doctor can bring skewness in the result. Hence, in such scenarios, you should prefer to use `median`

; the syntax in Excel is shown below:

`1`

`=MEDIAN(n1, n2, n3 ... nn)`

To apply the median on the table, you can proceed as `=MEDIAN(A1, B1, C1, D1, E1)`

which results in the answer as **12**. If you compare this answer with the `=AVERAGE(A1, B1, C1, D1, E1)`

answer, **26.6 = 27** makes more sense.

Permutation is a widely used concept in the realm of statistics. You can perform the permutation on a given data using the given syntax in Excel:

`1`

`=PERMUT(n, n_chosen)`

To understand what permutation is and how you can implement it in Excel, consider a case of three athletes of which only two need to be selected for a marathon. Permutation helps in answering this question without counting all the possibilities by hand. If you represent this case in a tabular format, it will look like this:

A | B |
---|---|

Number of athletes | To be chosen |

3 | 2 |

You can implement the `PERMUT`

function as shown:

`1`

`=PERMUT(A1, B1)`

This gives the answer **6**, stating that there are six possibilities for choosing two athletes from a group of three.

The COUNTBLANK function returns "the number of blank cells in a given specified range of cells," according to Excel's documentation. This can be a very useful function when you need to know how many values are missing in data.

The syntax for the COUNTBLANK function is given below:

`1`

`=COUNTBLANK(range_of_cells)`

To implement it in Excel, consider the given tabular data:

A |
---|

Values |

9 |

5 |

8 |

7 |

So, if you implement `=COUNTBLANK(A1:A7)`

, it counts the number of blank values. The result of the formula is **3**. Three represents the total of the one blank value between **9** and **5** along with two blank values between **5** and **8**.

Assume you need to calculate the number of people whose marks are above a cut-off decided at value 88. To do this, you can use the `COUNTIF`

function whose syntax is given below:

`1`

`=COUNTIF(range_of_cells)`

To illustrate the scenario, consider the given tabular data:

A |
---|

Marks |

50 |

90 |

23 |

65 |

98 |

55 |

On the given tabular data, implement the following formula:

`1`

`=COUNTIF(A1:A7, ">88")`

This results in the value **2** for the values 90 and 98. Note that the function has also ignored the blank cell.

Consider a column with a large number of values ranging from 1-1000 and you need to count how many values fall in the bins 0-270, 271-350, 351-888, and 888-1000. If you have a small number of values available then this can be done manually. However, for a large set of numbers, it is suggested to use the FREQUENCY function. Its syntax is given below:

`1`

`=FREQUENCY(value_range, bin_range)`

Consider this tabular data to implement the function:

A | B |
---|---|

Values | Bins |

255 | 270 |

615 | 350 |

478 | 888 |

24 | 1000 |

11 | |

998 | |

548 | |

250 | |

341 | |

880 | |

1222 | |

650 | |

555 |

To implement the `FREQUENCY`

function, you first need to learn the array function. The given `FREQUENCY`

function doesn't result in only one item, rather it results in one plus the number of bin values. In our table, we have four values available in **Bins**, so the output of the `FREQUENCY`

function will have five values which form an array. Therefore, first, you need to select five vertical cells and write the given formula. The formula should be written in the first cell:

`1`

`=FREQUENCY(A1:A13, B1:B4)`

Once you write the formula, press `Ctrl + Shift + Enter`

. This will result in the following answer and will also write the above formula within the curly braces, `{}`

, of all the cells .

A | B | C |
---|---|---|

Values | Bins | Frequency |

255 | 270 | 4 |

615 | 350 | 1 |

478 | 888 | 6 |

24 | 1000 | 1 |

11 | 1 | |

998 | ||

548 | ||

250 | ||

341 | ||

880 | ||

1222 | ||

650 | ||

555 |

You can refer to this short YouTube video to learn how to use the array function.

So far, you have learned about the functions that either belong to the measure of central tendency or are commonly used. In this section, you will explore two statistical functions that belong to the dispersion class i.e., the variance and standard deviation of a sample collected from a population data.

Their respective syntax is presented below:

`1 2`

`=STDEV(data_array) =VAR(data_array)`

Consider a sample of five values taken from a population data as shown:

A |
---|

50 |

98 |

12 |

233 |

1 |

To find the variance and standard deviation in this data, you can implement the functions as shown:

`1 2`

`=VAR(A1:A5) =STDEV(A1:A5)`

This results in the values are **Variance = 8872.7** and **Standard Deviation = 94.19**.

The `PERCENTILE.INC`

function returns "the K'th percentile of values in a provided range, where K lies in the range 0 - 1 (both inclusive)," according to Excel's documentation.

The `QUARTILE.INC`

function returns "the specified quartile of a set range of provided numbers, based on percentile value 0 - 1 (both inclusive)," according to Excel's documentation.

Here's the list of available percentiles that you can attain through the `QUARTILE.INC`

function:

Argument | Result |
---|---|

0 | Minimum value |

1 | First quartile (25th percentile) |

2 | Second quartile (50th percentile). Also, the median. |

3 | Third quartile (75th percentile) |

4 | Maximum quartile |

The `RANK.EQ`

function returns the mode of a list of provided numbers. FYI, mode means the number with the most frequency. Also, if more than one value has the same rank, the top rank of that set is returned.

Here're the syntax of these functions:

`1 2 3`

`=PERCENTILE.INC(data, percentile) =QUARTILE.INC(data, quartile) =RANK.EQ(number_whose_rank_is_to_be_calculated, reference, [order])`

Consider the below tabular data to implement these functions:

A |
---|

5 |

5 |

8 |

9 |

7 |

1 |

11 |

4 |

2 |

5 |

To find the 80th percentile of the data, use the following:

`1`

`=PERCENTILE.INC(A1:A10, 0.8)`

This gives the value as **8.2**.

To find the 50th percentile using the `QUARTILE.INC`

function, pass the value **2** to the `quartile`

argument:

`1`

`=QUARTILE.INC(A1:A10, 2)`

That gives the value as **5**.

To find the rank of number **7**, proceed with the following formula:

`1`

`=RANK.EQ(A5, A1:A10)`

That gives you the rank of number **7** as **4**.

In this guide, you've learned various important statistical functions that are commonly used in Excel. So, go ahead and start exploring them in your own scenarios.

13