R Crash Course Part II

5. Vectors

In R data can be represented in different structure types. Four main types are differentiated: vectors, matrices, lists and data frames. A conversion of these is possible with the corresponding functions as.vector(), as.matrix(), as.list(), or as.data.frame().
Let us have a look at the most basic of those data types: vectors.
There are six atomic vector types in R. In this context, “atomic” means that vectors contain only elements belonging to one of the following classes:

class examples
character “time”, “space”, “R is a lot of fun!”
numeric/ double 3, -8, -14.4, 56.8
integer 5, 9, 23, 654
logical TRUE, FALSE
complex 1 + 2i, 3-6i
raw raw sequences of bytes

Thus, a vector is a simple collection of elements (e.g., character, integer, …) containing only objects of the same class. Imagine a vector like a sequence of cells containing data. Vectors of different types can be generated via the c() function (“c” like combine):

a <- c("Hello", "World")    # character/ string 

class(a)
## [1] "character"

b <- c(1.3, 5.6, 3.0)       # numeric 
class(b)
## [1] "numeric"

c <- c(42L, 99L, 46L)       # integer 
class(c)
## [1] "integer"

d <- c(TRUE, TRUE, FALSE)   # logical 
class(d)
## [1] "logical"

e <- c(1 + 2i, 3-6i)        # complex 
class(e)
## [1] "complex"

length(a)                   # get number of elements in vector
## [1] 2

Integer values ​​must be explicitly declared with an “L”, otherwise they are considered numeric or double.
Everything in R is an object that belongs to certain classes. This is because R is subject to an object-oriented paradigm. However, within this course, we only focus on the relevant aspects of object orientation in order not to go beyond the scope.
Objects have different properties, or attributes, such as column names, dimensionality, length or class membership, and many more. These attributes can be queried using, for example, class() or length() function (like above).
The official cheat sheets also provide a very good overview of data types and their most important properties.

If several classes are combined in one vector, the conversion into a uniform class is automatically forced:

ab <- c("Hello", 1.3, "World", 5.6)   
class(ab)
## [1] "character"

ab
## [1] "Hello" "1.3"   "World" "5.6"


cd <- c(TRUE, 45L, FALSE, 13L)
class(cd)
## [1] "integer"

cd
## [1]  1 45  0 13

It should be noted that the elements are then partially modified (TRUEbecomes the numeric value 1)!

Conversions between classes is also possible. But when you try to convert a character vector to numeric values, NA values ​​(NA for not available) are introduced:

x <- c(0L, 1L, 2L, 3L, 4L, 5L) 
x
## [1] 0 1 2 3 4 5

x <- as.character(x)
x
## [1] "0" "1" "2" "3" "4" "5"

y <- c("R", "is", "fun")
y
## [1] "R"   "is"  "fun"

y <- as.numeric(y)
## Warning message: NAs introduced by coercion 

y
## [1] NA NA NA

Indexing in vectors

To address individual elements in our vector, we must “index” it. We index something in R by using square brackets [] . Counting starts at 1 – so we get the first element of the vector with [1]. If the index number is greater than the length of the vector, NA values ​​are obtained.
Since a vector is one-dimensional, we only need to write one number as an index between the square brackets in order to retrieve the respective entry. A minus sign in front of the index number leads to an exclusion of the respective entry:

x <- c(4, 2, 7, 9, 3) 
x
## [1] 4 2 7 9 3

x[1]           # first element
## [1] 4

x[-3]          # all but the third element
## [1] 4 2 9 3

x[2:4]         # all elements from second to fourth entry
## [1] 2 7 9

x[c(2, 5)]     # only the second and fifth element
## [1] 2 3

Calculate with vectors

Numerical vectors are calculated element by element. By multiplying a vector with the value 5, we again get a vector of the same length in which each element has been multiplied by 5:

v1 <- c( 1,  3,  5,  7)

v1 * 5
## [1]  5 15 25 35

v1 + 100
## [1] 101 103 105 107

If we have two vectors of the same length, we can compute them both element by element, so the first element of one vector is calculated with the first element of the other, and so on.

v1 <- c( 1,  3,  5,  7)
v2 <- c(20, 40, 60, 80)
  
v1 + v2
## [1] 21 43 65 87

v2 * v1
## [1]  20 120 300 560

However, if the two vectors are of different lengths, the shorter will be “recycled” until the length of the other vector is reached (“recycling rule”):

v3 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
v4 <- c(10, 1)  

v3 + v4
##  [1] 11  3 13  5 15  7 17  9 19 11

There are many useful functions in R that speed up data analysis. Here are some functions that are already implemented in the base package and thus in every basic installation of R:

v5 <- c(2, 4, 6, 8, 1, 3)

mean(v5)                    # arithmetisches Mittel des Vektors
## [1] 4

median(v5)                  # median
## [1] 3.5

max(v5)                     # maximum value
## [1] 8

min(v5)                     # minimum value
## [1] 1

sum(v5)                     # sum of all elements
## [1] 24

quantile(v5)                # quantiles
##   0%  25%  50%  75% 100% 
## 1.00 2.25 3.50 5.50 8.00

var(v5)                     # variance
## [1] 6.8

sd(v5)                      # standard deviation
## [1] 2.607681

sort(v5, decreasing=TRUE)   # sort elements
## [1] 8 6 4 3 2 1

Factors

Another class that you might will stumble upon when working with datasets is the factors. Factors represent categorical data and are beneficial for many statistical analyzes as well as plotting the data. When importing data into R, strings are automatically converted to factors by default:

# unordered factor vector
fa <- factor( c("good", "bad", "bad", "bad", "good", "good") )

class(fa)
## [1] "factor"

fa
## [1] good bad  bad  bad  good good
## Levels: bad good

str(fa)
##  Factor w/ 2 levels "bad","good": 2 1 1 1 2 2

as.integer(fa)
## [1] 2 1 1 1 2 2

levels(fa)
## [1] "bad"  "good"

Factors are based on integer values, which we can show with the function str(): Each individual expression in the vector represents a level and is assigned to an integer value (in this case good is 2 and bad is 1) ). The assignment of the integer values ​​takes place alphabetically for character vectors.

So much for vectors in R. It is time for you to become active: