Working in R work is working with functions. They are at the heart of the language and it's a marvelous thing because it makes R work easily reproducible, organized, and scaleable across teams. If your R script is more than 50 lines long, do future-you a favor and write some functions. If you're reading this, congratulate yourself that you've picked a foundational piece of the language to focus on - knowing these building blocks are critical to data craftsmanship. As John Chambers mentioned, Everything that happens in R is a function call. While you've likely already used built-in functions, this guide will help your write your own.
For those who are curious, we'll build this via RStudio and R Markdown.
What do you mean here, precisely? As in most programming languages, in R a function is a collection of statements that typically receive some input, do some computation, and provide an output.
There are hundreds of fabulous built-in functions in R. Check them out, learn them well, and get excited about writing your own functions. Even though R is a stats or data-related language, to work effectively in R you should bring all great software engineering principles with you.
In order to focus on the structure and not the logic, here's a simple function that takes two inputs and does subtraction:
1subby <- function(a, b) {
2 a - b
3}
And here's how it's called:
1subby(5, 3)
Note there are three parts to R functions:
a
and b
. See ?formals
for more.?body
for more.?environment
for more. Note that if you're just starting out, try to build and use a few functions before diving deep into environments.
If you're the hands-on type, run the code above and then formals(subby)
, body(subby)
, and environment(subby)
to make this stick.
Functions provide numerous benefits. They make it easier to
These things apply across user-defined and built-in functions:
You can do anything with functions that you can do with vectors: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function.
return
is only included if you're returning out of a function early (e.g., if an error has occurred).Occasionally, your function will need a default value (which provides a variable value even if the functional call doesn't specify that argument). Here's how this is simply done:
1subby <- function(a, b = 10) {
2 a - b
3}
4
5subby(12)
When you're inside of a function, R creates a new environment for you. This function environment includes everything from the environment in which the function was created. In the following code snippet, z
is a global variable (i.e., a variable that exists throughout the execution of the program). It can be changed and used in any part of your script (including inside of a function).
1z = 4
2
3subby <- function(a) {
4 a - z
5}
6
7subby(10)
Note that this is just to show how R function environments work - in your function, you shouldn't rely on global variables.
Note that variables declared inside the function are local to that function. In other words, if we declare the function like this, then f is a local variable which is only accessible to the environment of that function.
1subby <- function(a, b) {
2 f <- a - b
3 f # Return the variable
4}
5
6subby(10, 2)
This means that f
cannot be referenced in the main script.
In R, as in any language, there are certain tactics that'll make your own functions more reliable and scalable. Whether you’re distributing these functions inside or outside of your company, this is how to make friends quickly.
Write functions that do one thing well and are named descriptively (ideally use verbs for function names). These two things give each function a unique identity and your code will be much cleaner and easier to debug.
When it's not worth the time to give your function a name, you're dealing with an anonymous function. Here's what this looks like. Note that the function definition is within the curly braces: first the arguments and then the logic, followed by the function call being made via (2,3)
.
1{function (x,y) x^y} (2,3)
Where should you put your functions? Great question. There are two main options:
library()
statements at the top is fine for many tasks.Avoid bunching all functions in a separate file from your main script, as that makes distribution difficult
Whether you're a vet or a newbie, knowing the ins and outs of R functions will save you an inordinate amount of time in your R work. Think of them as one of the most important tools in your bag.