Binomial Coefficient Analysis with R

Learn how to perform binomial coefficient analysis for applications like logistic regression.

Dec 15, 2019 • 5 Minute Read

Introduction

Binomial coefficients are positive integers that occur as components in the binomial theorem, an important theorem with applications in several machine learning algorithms.

The theorem starts with the concept of a binomial, which is an algebraic expression that contains two terms, such as a and b or x and y. The binomial theorem describes the algebraic expansion of powers of a binomial. The binomial expansion leads us to the binomial coefficients which, in other words, are the numbers that appear as the coefficients of the terms in the theorem.

The binomial theorem is one of the most important classes of discrete probability distributions, which are extensively used in machine learning, most notably in the modeling of binary and multi-class classification problems. A popular use case is logistic regression, where the response variable is assumed to follow the binomial distribution. It is also used in text analytics applications such as modeling the distribution of words in text.

In this guide, the reader will learn how to perform binomial coefficient analysis in the statistical programming language R.

Factorials

Before understanding binomial coefficients, it's imperative to understand the concept of factorials because of their use in calculating binomial coefficients. In simple terms, the factorial of a positive integer n is the product of all positive integers less than or equal to n, and is denoted by n!. Take, for example, 6! equals to 6 * 5 * 4 * 3 * 2 * 1 = 720. The relevant function in R is the factorial() function.

For example, 6! or 20! can be calculated in R using the syntax below.

      factorial(6)

factorial(20)

Output:

      1] 720

[1] 2.432902e+18

It’s important to note that the factorial of zero is one. The underlying rationale is that there is exactly one permutation possible for selecting zero objects.

      factorial(0)

Output:

      1] 1

Binomial Coefficients

Binomial coefficients are used to describe the number of combinations of k items that can be selected from a set of n items. The symbol C(n,k) is used to denote a binomial coefficient, which is also sometimes read as "n choose k". This is also known as a combination or combinatorial number.

The relevant R function to calculate the binomial coefficients is choose(). For example, if we want to find out how many ways are there to choose two items out of seven, this can be calculated using the code below.

      choose(7,2)

Output:

      1] 21

Note that k must be less than n, otherwise the output will be zero, as shown below.

      choose(2,7)

Output:

      1] 0

Conclusion

In this guide, you have learned about the basics of binomial distribution. You also learned how to compute the factorial and binomial coefficients using R. These coefficients form the components of the binomial distribution, which is used in predictive modeling applications like binary and multinomial classification. It is also used in probability theory, which forms the basis of powerful statistical algorithms like logistic regression and naïve bayes. Understanding these concepts will help you in understanding the distribution of variables in the data, thereby assisting in selecting the best machine learning model.

To learn more about data science using 'R', please refer to the following guides.