Author avatar

Nishant Kumar Singh

Explore R Libraries: Purrr

Nishant Kumar Singh

  • Sep 21, 2020
  • 4 Min read
  • 31 Views
  • Sep 21, 2020
  • 4 Min read
  • 31 Views
Data
Data Analytics
Machine Learning

Introduction

At its core, R is a functional programming language, and the purrr package enhances the functional programming aspect of R by providing different methods to work with functions and vectors.

The concept of functional programming is a separate topic that needs its own discussion. You can find an overview here, but to give you a basic idea here is a short example.

Suppose you are working with a small data set that only contains numerical values, but due to a data entry error a few columns contain characters. If you were not following functional programming principles, you would go to each cell location to manually replace the data point. If you were following functional programming principles, you would create a function that takes the data set as argument and replaces characters with appropriate values.

Introducing Purrr

As mentioned above, purrr enhances the functional programming aspect of R. You can use this package to avoid many loops and write simple code that is easy to understand. But before that, you need to install this package:

1
2
3
4
install.packages("purrr")

# alternate way
install.packages("tidyverse")
html

Introducing map() Function

The map function of a purrr package applies a function to each element of a vector. If the function returns a vector, the map function returns a list. There are two more variants of map functions available: map_if and map_at. These variants take an extra predicate function that determines which elements of the vector are transformed by the function.

1
2
3
4
5
6
7
8
# Applying map function on a vector

library(purrr)
library(dplyr)

1:10 %>%
  map(rnorm, n = 10)
 
html

Run the above code to get a list that contains normally distributed data. The syntax of the map function is: map(.x, .f, ...). .x is the list of atomic vectors, .f is a function or formula, and ... is an additional argument passed on to the mapped function.

Introducing a Map Function that Returns an Atomic Vector

There are a few map functions that always return an atomic vector. The type of the vector depends on which type of map function you use in the code. The following are map functions that return an atomic vector: map_lgl(), map_int(), map_dbl() and map_chr().

1
2
3
1:10 %>%
  map(rnorm, n = 10)%>%
map_dbl(mean)
html

In the above code, a list is passed in map_dbl() that is output of the map() function. Run the code to understand the output.

Realistic Example of map() Function

The following example shows how the map() function can be used while building data science models using the mtcars data set. First, the data is split based on the values of the first column. Then the linear model is applied on each split. The map() method will apply the lm() function and generate the number of models, which will be equal to the number of splits. Llater, it applies summary functions on each model, and from that output map_dbl is drawing the r.squared value. You can follow the same pattern in different data sets to see whether dividing the data can improve the performance of the model or not.

1
2
3
4
5
6
7
8
9
10
11
12
# Loading data 
data(mtcars)

mtcars %>%
# Splitting data on cyl column
  split(.$cyl) %>%
# Applying lm() to create linear models
  map(~ lm(mpg ~ wt, data = .x)) %>%
# Generating summary of each model
  map(summary) %>%
# Withdrawing r.squared value 
  map_dbl("r.squared")
html

Conclusion

Functional programming is a very strong aspect of R that has been enhanced by the purrr package's map() function. In this guide you have gone through a real life example where we are applying the map() function to create different models without repeating code. There are different variants of map functions available in the purrr package that can be used in different scenarios. You can find more information here.

You can follow the same example and apply the map() function on different datasets in R. If you are working on a data science project, then you can use purrr.

0