R Crash Course Part I

1. Package Management

Packages are collections of functions and compiled code written by the R community or the R Development Core Team. R comes with a standard set of packages, e.g., base, stats, and graphics. However, R is enormously expandable in its (geo-)statistical functionality via the official network CRAN (The Comprehensive R Archive Network). There are over 11,000 different extensions, or packages, available free of charge. The directory, where installed packages are stored, is called library.
Your can install new packages to your library by clicking on the Packages-tab and then Install in the Files Pane of R-Studio. Anyway, download and installation of packages can also be done script-based via the function install.packages() (illustrated using the ggplot2 package, which is useful for creating elegant data visualisations):

install.packages("ggplot2")

library()

The function library() in line 3 lists all currently installed packages on your system. This list is also visible in the Files Pane of R-Studio (Packages tab). If problems arise, make sure VirtualBox has access to the internet and is not blocked by a firewall on your host system. Keep in mind that packages must be installed only once and remain permanently installed, even after a restart of R-Studio.
Once installed, you have to load the package into your current R-session before you can use its functionalities by using library() together with the package name:

library(ggplot2)    

You can list all packages, wich are loaded in your current R session by using search(). Activated packages also have a tick symbol next to their names in the Files pane (Packages tab) of R-Studio.


2. Calculate With R

Of course, R can be used as a simple calculator. Required operators can be entered directly into the Console pane of R-Studio or as a whole script in the Source Pane, from which you can send the commands to the Console pane with Ctrl+Enter. Results are then immediately printed to the console. In this online course, corresponding outputs are also shown with two hash tags at the beginning of the line for better transparency and readability:

# hash tags allow you to make valuable notes and reminders 

19 + 23      
## [1] 42

34 - 22     
## [1] 12

27 / 9       
## [1] 3

6 * 8        
## [1] 48

(2 + 3) * 4  
## [1] 20

You will see a number in square brackets [1] at the beginning of your output prompts. This number refers to the length of your output, i.e., the number of elements, which is 1 for all examples above. More on that in chapter Vectors. In addition to these standard operators, there were plenty of other operators commonly encountered in R:

  ?   help function
  +   –   /   *   ^   addition, subtraction, division, multiplication, potentiation
  !   negation sign
  <   >   <=   >=   ==   !=   lesser, greater, lesser or equal, greater or equal, equal, not equal
  &   |   boolean AND, boolean OR
  <-   variable assignment
  ~   separate left- and right-hand sides in a model formula
  :   generate regular sequences
  %%   modulo
  []  [[]]  $  @   indexing in vectors, matrices, and data frames

The help operator in R provides access to the documentation pages for R functions, data sets, and other objects, both for packages in the standard R distribution and for contributed packages. In order to access documentation for the sequence operator, for example, enter the command ?”:” or help(“:”).


3. Variables

In most cases, however, you will want to cache results of commands in order re-access them later on. Then, variables come into play. In R variables are defined using the  <-  operator. Although the output will not be printed to the Console pane directly, we store the variable in our temporary workspace. Thus, the variable should be visible under Values ​​in the Environment pane in R-Studio.

x <- 8 + 7         # assignment to variable x

y <- 4 * 2         # assignment to variable y

We only get the value of the variable when we call its name as a command or look into the Environment pane. Further calculations with the variables are also possible:

x            
## [1] 15

y  
## [1] 8

x + y        
## [1] 23

my.variable <- x + y
my.variable 
## [1] 23

A convention in R is to include points in variable names, e.g., my.variable. This is for the sake of clarity only, especially when many variables exist, and has no deeper meaning beyond that. Strictly avoid any other special symbols in variable names.


4. Functions

A function is a piece of code written to carry out a specific task. It can accept arguments and returns one or more values.
R standard packages offer several arithmetic built-in functions and constants, which make statistical analysis quite efficient. A function generally consists of a function name and two parentheses  () , in which arguments are given as input. Of course, previously created variables can serve as arguments for functions, too:

sqrt(64)           # square root 
## [1] 8

exp(3)             # exponential
## [1] 20.08554

cos(13)            # cosinus
## [1] 0.9074468

pi                 # constant number pi
## [1] 3.141593

round(pi)          # round values
## [1] 3

a <- 6
b <- 9.2
log10(a + b)       # logarithm (base 10)
## [1] 1.181844

So, the best way to learn about the internal workings of a function is to write your own one. R allows to create user defined functions, whereby the basic construct of each function is the following:

name.of.fun( arguments ) { body }

The code in between the curly braces is the body of the function. This is where you define all the commands your functions will perform. Let us write a function that calculates a normalized ratio of two numeric values! It is best to copy the following code into the script window, select everything and then execute the code. The function should then appear in the Environment window and can be called hereinafter.

my.fun <- function(var1, var2) {
  x <- (var1 - var2) / (var1 + var2)
  return(x)
}

Explanation: Use function() to create a new function and assign it to any variable, e.g.,my.fun. The two arguments var1 and var2 are placeholders for variables that are assigned when this function is called. Operations of the function are defined between the curly braces {}. Intermediate results, i.e., the x in our example, exist locally within the function – they do not appear in the Environment window. Only variables given to the return function can be saved as a variable. The function call is done via:

result <- my.fun(42, 13)
result 
## [1] 0.5272727

If you feel ready click the button below and check your understanding up to here!