Author avatar

Dániel Szabó

Matrices, Lists, and Arrays in R

Dániel Szabó

  • Apr 22, 2020
  • 11 Min read
  • 278 Views
  • Apr 22, 2020
  • 11 Min read
  • 278 Views
Data
Data Analytics
Languages and Libraries
R

Introduction

In this guide, we are going to learn about three of the most common datastructures in R. These datastructures will be familiar from your math studies, and they are also present in every programming language with very little difference regarding the implementation. These datastructures are matrices, which are two-dimensional verctors, lists, which are one-dimensional vectors or special objects that can hold items with different types, and arrays, which are vectors with one or more dimensions. First the basic concept of each of these data containers will be introduced, and then we will look at their practical use cases. At the heart of each datastructure, the atomic vector datatype is hidden.

Matrices

Matrices are nothing more than a collection of data elements arranged in a rectangular layout that is two-dimensional. An example matrix with 3x3 dimensions looks like this.

1
2
3
4
     [,1] [,2] [,3] 
[1,]    1    2    3 
[2,]    4    5    6
[3,]    7    8    9
R

The most important thing you need to remember to get started with matrices is the matrix() function. This function has the following skeleton.

1
2
3
4
5
matrix( 
   c(), 
   nrow=,
   ncol=,
   byrow = )
R

The first argument is a vector that defines which atomic values are present in the matrix. The second argument defines how many rows that vector splits up, and the third argument tells how many columns. The number of elements in the vector should be multiple or sub-multiple to nrow * ncol. The last argument defines whether you want to fill up the matrix by rows or columns. By default, the argument for byrow is FALSE, which means the matrix if filled up from column to column.

Let's try this one out. Your matrix definition looks like this.

1
2
3
4
5
matrix(
c(1,2,3,4,5,6,7,8),
nrow = 4,
ncol = 2,
byrow = TRUE)
R

The output should look like this.

1
2
3
4
5
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
[4,]    7    8
bash

If you omit the byrow=TRUE argument, the following output greets you.

1
2
3
     [,1] [,2]
[1,]    1    3
[2,]    2    4
bash

Where did the rest of the elements go? The problem is that your vector is bigger than the matrix size. If you want to get all the values from the vector, the ncol=4 should be the modification you make. That way you have the following output.

1
2
3
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8
bash

Now that you have the foundations, how do you proceed?

Transpose

This concept comes from linear algebra. Basically, what happens is that the matrix gets flipped over its diagonal. In order to do this in R, you can use the t() function. Let's see how it looksgiven the matrix below.

1
2
3
4
5
a <- matrix(
c(1,2,3,4,5,6,7,8),
nrow = 4,
ncol = 2,
byrow = TRUE)
R

Before transposing, the output looks like this.

1
2
3
4
5
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
[4,]    7    8
bash

After transposing, the output looks like this.

1
2
3
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8
bash

Combine Matrices

In order to combine two matrices, the cbind() function needs to be used. It takes two matrices as arguments and produces their combination. When you are combining matrices, you need to make sure the they have the same number of rows, otherwise an exception is thrown.

Given are two matrices B and D.

1
2
3
4
5
6
7
8
9
B <- matrix( 
c(2, 4, 3, 1, 5, 7), 
nrow=3, 
ncol=2) 

D <-  matrix( 
c(1, 3, 2), 
nrow=3, 
ncol=1) 
R

Their combination can be created the following way.

1
cbind(B,D)
R

The output looks like this.

1
2
3
4
     [,1] [,2] [,3]
[1,]    2    1    1
[2,]    4    5    3
[3,]    3    7    2
bash

Deconstruction

This concept allows you to break down the matrix into its original vector, which can come handy in certain situations. Take the following matrix called H.

1
2
3
4
H <-  matrix( 
c(1,2,3,4,5,6,7,8,9,10), 
nrow=5, 
ncol=2) 
R

You are able to deconstruct it with the c() function.

1
c(H)
R

The output looks like this.

1
 [1]  1  2  3  4  5  6  7  8  9 10
bash

Lists

Lists are objects that may contain elements of different types, similar to vectors. These different types can be of strings, numbers, vectors, and even another list inside. You can have matrices as different elements in your lists. The concept is a general container for special use cases. The function that allows you to create a list is called list().

An example list would look like this.

1
data <- list("Server","Network Device",c(1,2,3,4), FALSE, list(1,2,3,4,5,6))
R

The content of your list is now as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[[1]]
[1] "Server"

[[2]]
[1] "Network Device"

[[3]]
[1] 1 2 3 4

[[4]]
[1] FALSE

[[5]]
[[5]][[1]]
[1] 1

[[5]][[2]]
[1] 2

[[5]][[3]]
[1] 3

[[5]][[4]]
[1] 4

[[5]][[5]]
[1] 5

[[5]][[6]]
[1] 6
bash

You can see that there is not really any limit as to how many or what type of elements you can store in the list. There is a special function called names() that allows you to name your list, which results in a special dictionary-like datastructure. A dictionary datastructure consists of a key-value pair. In this case, the key is the list of names and the values are the actual elements.

Let's give names to the list elements.

1
names(data) <- c("Hardware", "Network", "vector", "boolean","nestedlist")
R

After the function is executed, we can refer to the elements in the list by their names.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
data$Hardware
[1] "Server"

data$Network
[1] "Network Device"

data$nestedlist
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

[[6]]
[1] 6
R

This allows you to build more sophisticated functions and create abstractions that allow users to understand and maintain the code more efficiently. As with lists in other programming languages, you can access, manipulate, and merge the lists. The indexing starts from 1.

In order to access the elements, refer to them with their indexes.

Let's retrieve the first and second elements.

1
2
3
4
5
6
7
> data[1]
$Hardware
[1] "Server"

> data[2]
$Network
[1] "Network Device"
R

In order to remove a specific element, assign the NULL value to its index. This will reduce the length of your list.

Let's remove the nested list. You can do this in two ways. The second one will only work if you have named your list elements.

1
2
3
data[5] <- NULL

data$nestedlist <- NULL
R

Suppose you have two lists from different datasources, and you have a function that needs data from both of them. You have the option to merge these two lists.

1
2
monthids <- list(1,2,3,4,5,6,7,8,9,10,11,12)
months <- list("Jan","Feb","Mar","Apr","May","June","July","Aug","Sep","Oct","Nov","Dec")
R

The way to achieve this is to use the c() function.

1
merged.list <- c(monthids,months)
R

This will produce the following results.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

[[6]]
[1] 6

[[7]]
[1] 7

[[8]]
[1] 8

[[9]]
[1] 9

[[10]]
[1] 10

[[11]]
[1] 11

[[12]]
[1] 12

[[13]]
[1] "Jan"

[[14]]
[1] "Feb"

[[15]]
[1] "Mar"

[[16]]
[1] "Apr"

[[17]]
[1] "May"

[[18]]
[1] "June"

[[19]]
[1] "July"

[[20]]
[1] "Aug"

[[21]]
[1] "Sep"

[[22]]
[1] "Oct"

[[23]]
[1] "Nov"

[[24]]
[1] "Dec"
bash

The unlist() function allows you to convert your lists to vectors.

1
myvector <- unlist(merged.list)
R

After this, all the usual arithmetic operators can be applied to the newly created vector.

Arrays

An array is a vector with one or more dimensions. A one-dimensional array can be considered a vector, and an array with two dimensions can be considered a matrix. Behind the scenes, data is stored in a form of an n-dimensional matrix. The array() function can be used to create your own array. The only restriction is that arrays can only store data types.

You can create a simple array the following way.

1
2
3
v1 <- c(1,2,3)
v2 <- c(4,5,6,7,8,9)
result <- array(c(v1,v2),dim = c(3,3,2))
R

Now the result holds an array which has two matrices with three rows and three columns.

1
2
3
4
5
6
7
8
9
10
11
12
13
, , 1

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

, , 2

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
bash

The keyword here is dim. It defines the maximum number of indices in each dimension.

There is a more general syntax that is a skeleton to keep in mind and comes in handy most of the time.

1
my_array <- array(data, dim = (rows, colums, matrices, dimnames)
R

You have the option to name your rows, columns and matrices in an array. Suppose you extend your above code with the following.

1
2
3
4
5
6
v1 <- c(1,2,3)
v2 <- c(4,5,6,7,8,9)
col.names <- c("Item","Serial","Size")
row.names <- c("Server","Network","Firewall")
matrix.names <- c("DataCenter EU","DataCenter US")
result <- array(c(v1,v2),dim = c(3,3,2),dimnames = list(row.names,col.names,matrix.names))
R

Now the result array holds a more meaningful name that makes the code cleaner and easier to maintain.

1
2
3
4
5
6
7
8
9
10
11
12
13
, , DataCenter EU

         Item Serial Size
Server      1      4    7
Network     2      5    8
Firewall    3      6    9

, , DataCenter US

         Item Serial Size
Server      1      4    7
Network     2      5    8
Firewall    3      6    9
bash

Accessing the elements is a bit more tricky, but once you get the hang of it, it should become easy. The skeleton code you should keep in mind is the following.

1
result[row,column,matrix]
R

There is a neat trick with this. If you omit any of the arguments, they will be collected for all matrices, rows, or columns.

For example, if you were to collect the serials from each datacenter, all you would have to do is write the following.

1
result[1,2,]
R

The output should be the following.

1
2
DataCenterEU DataCenterUS 
           4            4 
R

If you were to collect the size of each device from every datacenter the following code would do the job.

1
result[,3,]
bash

The output should be the following.

1
2
3
4
         DataCenterEU DataCenterUS
Server              7            7
Network             8            8
Firewall            9            9
bash

Arrays allow you to create matrices from them with the following code. Let's separate each datacenter to their own matrix.

1
2
DCEU <- result[,,1]
DCUS <- result[,,2]
R

The corresponding outputs will be as you expect.

1
2
3
4
5
6
7
8
9
10
#DCEU
         Item Serial Size
Server      1      4    7
Network     2      5    8
Firewall    3      6    9
#DCUS
         Item Serial Size
Server      1      4    7
Network     2      5    8
Firewall    3      6    9
bash

Conclusion

This guide covered three crucial datastructures that are used by statistical analysts and other data-mining folks. We built up the foundations to understand how these datastructures build on each other and looked at practical examples of how we can manipulate their contents. I hope this guide has been informative to you, and I would like to thank you for reading it!

3