Author avatar

Dániel Szabó

Vectors and Factors in R

Dániel Szabó

  • Apr 15, 2020
  • 7 Min read
  • 377 Views
  • Apr 15, 2020
  • 7 Min read
  • 377 Views
Data
Data Analytics
Languages and Libraries
R

Introduction

In this guide, we're going to talk about vectors and factors. In short, a vector is a list of atomic values, and a factor is a list of vectors. These two features allow us to understand the most basic datastructure elements in R and start a journey of statistical analysis. First we'll clarify each concept, then we'll look at a demonstration of each of them.

Vectors

These are the most basic data objects in R. You can distinguish a total of six atomic types and use them in the most efficient way according to your current situation.

Atomic types: 1. Character 2. Logical 3. Integer 4. Double 5. Complex 6. Raw

Let's create a small script to demonstrate each of these.

1
2
3
4
5
6
print("welcome")
print(3.14)
print(100L)
print(FALSE)
print(10+3i)
print(charToRaw('atomic raw'))
R

Executing them will result in the following output.

1
2
3
4
5
6
[1] "welcome"
[1] 3.14
[1] 100
[1] FALSE
[1] 10+3i
[1] 61 74 6f 6d 69 63 20 72 61 77
bash

The first line represents an atomic character vector, which may be familiar to you from other programming languages as string or character sequence. The second is the atomic double type, and the third is the atomic integer type. The fourth is the atomic boolean type, which can be either TRUE or FALSE. The last uses the charToRaw() function to convert our atomic character type to an atomic raw type. The output is actually the byte representation of the character sequence.

Integer and double atomic vectors allow you to create a sequence, which can be done the following way.

Suppose you need a sequence of double values for a task. If you are fine with increments of 1, you can do it the following way.

1
2
v <- 0.3:10.3
print(v)
R

The output should look like this.

1
 0.3  1.3  2.3  3.3  4.3  5.3  6.3  7.3  8.3  9.3 10.3
bash

If you need to change the increments to a custom value, the seq() function is there to help you.

1
2
v <- seq(0,10,by = 0.5)
print(v)
R

The output shows an increment of 0.5 in this case.

1
0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0  8.5  9.0  9.5 10.0
bash

Vectors behave very similarly to arrays. You have the option to access subset or the vector or grab elements by their index. Keep in mind that indexing starts from 1! Suppose you have an atomic vector of characters that represent IT equipment, and you need to grab the first two. You can do that the following way.

1
2
3
t <- c("Server","Switch","Router","Firewall","Monitor")
u <- t[c(1,2)]
print(u)
R

The output should look like the following.

1
[1] "Server" "Switch"
bash

You have the option to access elements with negative indexing as well. This concept works as in other programming languages. For example, getting the element with the -2 index means you accessed the one before the last element.

If you have vectors of the same length, you have the option to manipulate them with the add, subtract, multiply, and divide operators. This can be handy when simulating or demonstrating matrix operations.

Suppose you have two vectors with three values and the type is double.

1
2
v2 <- c(1.1,2.2,3.3)
v1 <- c(4.4,5.5,6.6)
R

Perform the following operations in order.

1
2
3
4
v1 + v2
v1 - v2
v1 * v2
v1 / v2
R

You should get the following result.

1
2
3
4
[1] 5.5 7.7 9.9
[1] 3.3 3.3 3.3
[1] 4.84 12.10 21.78
[1] 4.0 2.5 2.0
bash

There is a concept called vector recycling that comes into play if you are to perform an arithmetic operation on two vectors with different lengths. The elements of the shorter vector are recycled in order for the operation to complete and yield results. The only thing to keep in mind is that it only works if the longer vector is a multiple of the shorter vector, otherwise it will fail.

For example:

1
2
3
v1 <- c(1,2,3,4,5,6)
v2 <- c(7,8)
v1 * v2
R

Output:

1
[1]  7 16 21 32 35 48
bash

The content elements of v2 will be considered as 7,8,7,8,7,8.

Last but not least, when you are working with vectors, you should remember the sort() function. It takes an atomic vector and sorts the elements in either decreasing order or increasing order as per your function call.

1
2
3
4
# sort increasing order
v1 <- sort(c(4,2,3,1,9,8,6))
# sort decreasing order
v1 <- sort(c(4,2,3,1,9,8,6), decreasing = TRUE)
R

The decreasing argument of the sort function is FALSE by default.

Factors

Factors enjoy widespread popularity in statistical modeling and analysis. In concept, factors are implemented in R as variables that can take on a limited number of different values. They are also referred to as categorical variables. In realization, factors are stored as a vector of integer values with a corresponding set of character values that are used to display a factor. In order to create a factor, the factor() function needs to be used. When you create a factor, the only input argument you need to specify is a vector of values from any atomic type, and the factor function will return a vector of factor values. This relates to the concept of levels, where the level of a factor is basically the number of distinct elements.

Let's take an example vector that holds atomic characters and converts them to factors. The vector holds different types of drinks.

1
2
drinks <- factor(c("beer", "wine", "rum", "whiskey","cocktail","whiskey","rum"))
print(drink)
R

The output should look like this.

1
[1] beer     wine     rum      whiskey  cocktail whiskey  rum
bash

The first thing you note is that the elements of the factor created from the atomic character vector are stored in order. To get the subset of unique elements, the levels function can be used.

1
levels(drinks)
R

This returns the following result.

1
[1] "beer"     "cocktail" "rum"      "whiskey"  "wine"
bash

Note the double-quotes around the items.

You are able to access elements of a factor by their indexes, which start from 1!

In order to access the third element, you would use this code.

1
drinks[3]
R

You can also access subsections of a factor. Suppose you need the first two elements.

1
drinks[c(1,3)]
R

You are also able to modify elements of a factor, but be aware that you cannot modify elements outside their levels.

For example, this will work.

1
drinks[1] <- "wine"
R

This will fail.

1
drinks[1] <- "Coca Cola"
R

In order to overcome this problem, a new level needs to be introduced.

1
2
levels(drinks) <- c(levels(drinks), "Coca Cola") 
drinks[1] <- "Coca Cola"
R

The output should be as follows.

1
[1] Coca Cola wine      rum       whiskey   cocktail  whiskey   rum
bash

Conclusion

In this guide, we built up the knowledge to effectively use vectors and factors. We looked at the difference between these concepts and learned how they build upon each other to facilitate statistical analysis. I hope this guide has been informative to you and I would like to thank you for reading it!

9