At its core, R is a functional programming language, and the purrr package enhances the functional programming aspect of R by providing different methods to work with functions and vectors.
The concept of functional programming is a separate topic that needs its own discussion. You can find an overview here, but to give you a basic idea here is a short example.
Suppose you are working with a small data set that only contains numerical values, but due to a data entry error a few columns contain characters. If you were not following functional programming principles, you would go to each cell location to manually replace the data point. If you were following functional programming principles, you would create a function that takes the data set as argument and replaces characters with appropriate values.
As mentioned above, purrr enhances the functional programming aspect of R. You can use this package to avoid many loops and write simple code that is easy to understand. But before that, you need to install this package:
1 2 3 4
install.packages("purrr") # alternate way install.packages("tidyverse")
The map function of a purrr package applies a function to each element of a vector. If the function returns a vector, the map function returns a list. There are two more variants of map functions available:
map_at. These variants take an extra predicate function that determines which elements of the vector are transformed by the function.
1 2 3 4 5 6 7 8
# Applying map function on a vector library(purrr) library(dplyr) 1:10 %>% map(rnorm, n = 10)
Run the above code to get a list that contains normally distributed data.
The syntax of the map function is:
map(.x, .f, ...).
.x is the list of atomic vectors,
.f is a function or formula, and
... is an additional argument passed on to the mapped function.
There are a few map functions that always return an atomic vector. The type of the vector depends on which type of map function you use in the code. The following are map functions that return an atomic vector:
1 2 3
1:10 %>% map(rnorm, n = 10)%>% map_dbl(mean)
In the above code, a list is passed in
map_dbl() that is output of the
map() function. Run the code to understand the output.
The following example shows how the map() function can be used while building data science models using the
mtcars data set. First, the data is split based on the values of the first column. Then the linear model is applied on each split.
map() method will apply the
lm() function and generate the number of models, which will be equal to the number of splits. Llater, it applies summary functions on each model, and from that output
map_dbl is drawing the
You can follow the same pattern in different data sets to see whether dividing the data can improve the performance of the model or not.
1 2 3 4 5 6 7 8 9 10 11 12
# Loading data data(mtcars) mtcars %>% # Splitting data on cyl column split(.$cyl) %>% # Applying lm() to create linear models map(~ lm(mpg ~ wt, data = .x)) %>% # Generating summary of each model map(summary) %>% # Withdrawing r.squared value map_dbl("r.squared")
Functional programming is a very strong aspect of R that has been enhanced by the purrr package's
map() function. In this guide you have gone through a real life example where we are applying the
map() function to create different models without repeating code. There are different variants of map functions available in the purrr package that can be used in different scenarios. You can find more information here.
You can follow the same example and apply the
map() function on different datasets in R. If you are working on a data science project, then you can use purrr.