Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

The Beauty of Custom Functions in R

Jun 11, 2019 • 7 Minute Read

Introduction

Working in R work is working with functions. They are at the heart of the language and it's a marvelous thing because it makes R work easily reproducible, organized, and scaleable across teams. If your R script is more than 50 lines long, do future-you a favor and write some functions. If you're reading this, congratulate yourself that you've picked a foundational piece of the language to focus on - knowing these building blocks are critical to data craftsmanship. As John Chambers mentioned, Everything that happens in R is a function call. While you've likely already used built-in functions, this guide will help your write your own.

For those who are curious, we'll build this via RStudio and R Markdown.

The Basics

What Is a Function?

What do you mean here, precisely? As in most programming languages, in R a function is a collection of statements that typically receive some input, do some computation, and provide an output.

There are hundreds of fabulous built-in functions in R. Check them out, learn them well, and get excited about writing your own functions. Even though R is a stats or data-related language, to work effectively in R you should bring all great software engineering principles with you.

How Do I Define a Function?

In order to focus on the structure and not the logic, here's a simple function that takes two inputs and does subtraction:

      subby <- function(a, b) {
  a - b
}
    

And here's how it's called:

      subby(5, 3)
    

Note there are three parts to R functions:

  1. The formals or inputs you pass into the function. Here, that's a and b. See ?formals for more.
  2. The function definition (i.e., logic within the curly braces). See ?body for more.
  3. The environment of the function being considered. See ?environment for more.

Note that if you're just starting out, try to build and use a few functions before diving deep into environments.

If you're the hands-on type, run the code above and then formals(subby), body(subby), and environment(subby) to make this stick.

Why Is That a Beautiful Thing?

Functions provide numerous benefits. They make it easier to

  • Automatically check your code (via unit tests)
  • Organize your work
  • Document your work
  • Share your work
  • Debug your work

What's Special About Functions in R?

These things apply across user-defined and built-in functions:

  1. R has first-class functions. As Hadley Wickham states here,

You can do anything with functions that you can do with vectors: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function.

  1. Note that in R an explicit return statement isn't required. R will return the value of the last executed statement in the function definition. Generally return is only included if you're returning out of a function early (e.g., if an error has occurred).

Ins and Outs

Occasionally, your function will need a default value (which provides a variable value even if the functional call doesn't specify that argument). Here's how this is simply done:

      subby <- function(a, b = 10) {
  a - b
}

subby(12)
    

When you're inside of a function, R creates a new environment for you. This function environment includes everything from the environment in which the function was created. In the following code snippet, z is a global variable (i.e., a variable that exists throughout the execution of the program). It can be changed and used in any part of your script (including inside of a function).

      z = 4

subby <- function(a) {
  a - z
}

subby(10)
    

Note that this is just to show how R function environments work - in your function, you shouldn't rely on global variables.

Note that variables declared inside the function are local to that function. In other words, if we declare the function like this, then f is a local variable which is only accessible to the environment of that function.

      subby <- function(a, b) {
  f <- a - b
  f # Return the variable
}

subby(10, 2)
    

This means that f cannot be referenced in the main script.

Writing Great Functions

In R, as in any language, there are certain tactics that'll make your own functions more reliable and scalable. Whether you’re distributing these functions inside or outside of your company, this is how to make friends quickly.

  • Write functions that do one thing well and are named descriptively (ideally use verbs for function names). These two things give each function a unique identity and your code will be much cleaner and easier to debug.

  • Check argument inputs. Provide helpful messages if the input type is wrong or if the function is otherwise being used incorrectly.

Anonymous Functions

When it's not worth the time to give your function a name, you're dealing with an anonymous function. Here's what this looks like. Note that the function definition is within the curly braces: first the arguments and then the logic, followed by the function call being made via (2,3).

      {function (x,y) x^y} (2,3)
    

Note that anonymous functions in R are often used with the *apply class of functions. See here and here for much more.

Managing Your Functions

Where should you put your functions? Great question. There are two main options:

  1. Put all functions in the same script file; bunching your function definitions after your library() statements at the top is fine for many tasks.
  2. Write a proper package. If you're serious about distributing your code internally or externally, this is a must. It forces fantastic best practices and isn't very difficult.

Avoid bunching all functions in a separate file from your main script, as that makes distribution difficult

Conclusions

Whether you're a vet or a newbie, knowing the ins and outs of R functions will save you an inordinate amount of time in your R work. Think of them as one of the most important tools in your bag.