Author avatar

Gabriel Cánepa

Using Math and Statistical Functions in Microsoft Excel

Gabriel Cánepa

  • Jan 28, 2020
  • 11 Min read
  • 932 Views
  • Jan 28, 2020
  • 11 Min read
  • 932 Views
Business Professional
Microsoft Excel

Introduction

For many years, Microsoft Excel has been the de facto spreadsheet tool for all kinds of applications used to present and chart data. From inventory and project management to financial budgeting and forecasting, home users and professionals from all industries have used Excel extensively to accomplish their goals. Beyond the basic math and statistical operations (addition, subtraction, multiplication, division, average, and count, to name a few) in Excel, there are also several built-in, lesser-known functions that can make it easier to summarize and organize data.

To illustrate, we will use a spreadsheet of sales orders from the Northwind database. These datasets were originally provided by Microsoft in a SQL installation script and have been adapted for this guide in Excel format. The Orders and Orders Details tabs contain the raw data used in the examples below, and in the remaining tabs (Examples and Forecast) you will see each function in action. These can further be combined to create even more complex solutions, as shown in additional examples (examples 12-15) in the spreadsheet. For brevity, this last set of examples is not explained in this guide.

It is important to note that the functions listed below are classified as math or statistical as per the Microsoft Excel reference.

Math Functions

In this section we will review the following math functions: SUMPRODUCT, SUMIF, SUMIFS, ROUNDUP and ROUNDDOWN.

SUMPRODUCT

In the spreadsheet, a sales order total amount is calculated by adding each unit price multiplied by the corresponding quantity, whereas the revenue is the sum of all sales across a time period. To calculate total revenue, there are two methods: 1) inserting an extra column with the product of each unit price sold and the associated amount, or 2) using the SUMPRODUCT function. The latter approach is not only more straightforward, but also spares us adding the extra column.

Each sales order is broken down into a unit and quantity pair in Orders Details. The image above shows the formula used in cell C2 to calculate the total revenue. As you can see, SUMPRODUCT takes two ranges (or arrays) as inputs and by default returns the sum of all products between cells in the same row.

Note:: SUMPRODUCT can also be utilized to perform other math functions between the two arrays. To do so, we will need to replace the comma that separates the ranges with the required sign (+ for addition, - for subtraction or / for division).

SUMIF

As opposed to the well-known SUM function, SUMIF allows us to sum the values in a range that match a given criterion. The function takes at least two inputs: 1) a range of values, and 2) a condition. Alternatively, we can apply the condition to one range and sum the corresponding values in a separate column. For example, we can use SUMIF to see the total cost of shipping orders (already sent or under processing) to France. Here, C5 references columns G and F in Orders (Country and Freight, respectively):

In this case, SUMIF required three arguments: the lookup range (G2 through G831), the lookup value ("France" within double quotes), and the sum range (F2 through F831).

SUMIFS

This function is similar to SUMIF in that it provides a way to sum values in a range conditionally, but it uses a set of criteria instead of only one. For example, we can find out the cost of shipped orders to France. As opposed to the previous example, where all orders were taken into account, in this case we are only summing the cost of those orders where ShippedDate is not empty:

Cell C8 above shows that the sum range is the first argument, followed by each criteria range and the corresponding condition ("France" in G2:G831 and "<>", meaning a non-empty cell, in E2:E831, both within double quotes).

ROUNDDOWN and ROUNDUP

The purpose of rounding in math is to show a value in a simpler form while keeping its value close to the original. To accomplish this goal, Excel provides a number of functions: ROUND, ROUNDDOWN and ROUNDUP. While the first rounds up or down as per the usual rules, the last two can be used to always round down or up, respectively, to a specified number of digits.

Cells C11 and C14 use the following formulas to round the total sales down to two digits and up to one digit:

  • C11: =ROUNDDOWN(C2, 0)
  • C14: =ROUNDUP(C2, 1)

When the number of digits is less than zero, the number is rounded (up or down) to the left of the decimal point. Thus, =ROUNDUP(C20, -1) and =ROUNDUP(C20, -2) round up C20 to the nearest ten and hundred, respectively.

Statistical Functions

In this section we will review the following statistical functions: AVERAGEIF, AVERAGEIFS, COUNTIF, COUNTIFS, FREQUENCY and FORECAST.

AVERAGEIF

As you can probably guess, AVERAGEIF will return the average of a range of values that match a given condition. Among other things, it can help us determine whether our business with a given customer is profitable or not by looking at the average order amount for that client. In C17, we use this function to calculate the average amount of orders made by German clients:

In the above example, the lookup range is G2:G831 (Country) followed by the condition ("Germany" within double quotes) and the average range (D2:D831).

AVERAGEIFS

Notwithstanding how useful AVERAGEIF is, you will have to utilize AVERAGEIFS if you need to calculate an average based on two or more criteria. In this case, the first argument passed to the function is the average range, followed by two or more criteria range and condition pairs. For example, to find out the average order amount from US clients prepared by employee Nancy Davolio, use these inputs:

  • Average range: D2:D831
  • First criteria range: G2:G831
  • First condition: "USA"
  • Second criteria range: H2:831
  • Second condition: "Nancy Davolio"

COUNTIF and COUNTIFS

To count the number of values that match a condition in a given range, we can use COUNTIF. The required arguments are the count range and the lookup value, in that order. C23 shows the result of counting the number of cells where the name Margaret Peacock appears in the H2:H831 range:

If we are interested in adding another filter, such as orders not shipped yet, we can use COUNTIFS instead. In that case, each range/condition pair is added in order as arguments to the function:

As is the case with SUMIFS and AVERAGEIFS, COUNTIFS also admits as many conditions are needed—as long as each is accompanied by its corresponding criteria range.

FREQUENCY

To find out how often a series of values appear within a range, we can use FREQUENCY. This function requires two arrays as inputs: 1) a data range, and 2) a reference to interval breakpoints that will serve as groups for the data points in the first array.

Using FREQUENCY, we can answer questions such as "How many orders are there where the amount is above $2500?" or "How many below $500?" and so on. By partitioning the order amounts by intervals of 500, as shown in column F below, we get the following in G3:G8 after applying =FREQUENCY(Orders!D2:D831, F3:F7) to G3:

If you place the cursor on any cell between G4 and G8, you will notice that the formula is the same as in G3 but appears grayed out, meaning it cannot be edited because the output array originated in G3:

At first, it may be confusing to see that FREQUENCY returns an array whose length is equal to the number of interval breakpoints plus one. The last number in the function's output represents the number of orders where the amount is above the last breakpoint (2500 in this case).

FORECAST

Based on a known series of x's and y's, FORECAST returns the expected y for a given numeric x using linear regression.

If you inspect the records in Orders, you will notice that there are only a few rows for the month of May 1998. To see what the estimate of accumulated sales for May and June 1998 looks like, we will insert the following formulas to H17 and H18:

  • H17: =FORECAST(G17, H13:H16, G13:G16)
  • H18: =FORECAST(G18, H13:H17, G13:G17)

where:

  • G17 and G18 are the x values for which we want to predict the result in each case. They are dates that meet the requirement that the given x must be numeric, but they have been formatted as the first letters of the month in question for visualization purposes.

  • H13:H16 and H13:H17 are the known sets of y's.

  • G13:G16 and G13:G17 are the corresponding series of known x's.

If we draw a chart using the accumulated sales per month, we will see a linear trend that extends beyond January-April (our initial data set) to May and June (which we estimated above with FORECAST).

Note: In Excel 2016 and newer, FORECAST has been replaced with FORECAST.LINEAR. Although the former is available for backward compatibility and returns the same output as the latter, Microsoft recommends using FORECAST.LINEAR in the latest versions. This practice helps avoid confusion with the newer FORECAST functions, such as FORECAST.ETS, that use a machine learning-based algorithm for prediction.

Conclusion

In this guide we have explored how to leverage math and statistical functions in Microsoft Excel.

Using these tools, you will be able to derive meaning from large sets of records and to make decisions based on that information. For instance, you can easily identify where your most frequent customers live, when their next purchase might occur, and what your projected earnings across a time period are. In consumer goods manufacturing, you can predict how long it will take to make a certain amount of product given your current production rate, and what your expected losses are due to unplanned machine downtime, to name a few examples.

By anticipating these events through data analysis, your Microsoft Excel skills will help you save and earn money for your business.

14