{"id":1458,"date":"2018-06-21T23:49:36","date_gmt":"2018-06-21T21:49:36","guid":{"rendered":"https:\/\/blogs.fu-berlin.de\/reseda\/?page_id=1458"},"modified":"2018-09-25T10:31:51","modified_gmt":"2018-09-25T08:31:51","slug":"r-crash-course-4","status":"publish","type":"page","link":"https:\/\/blogs.fu-berlin.de\/reseda\/r-crash-course-4\/","title":{"rendered":"R Crash Course Part IV"},"content":{"rendered":"<p><a name=\"ch7\"><\/a><\/p>\n<h1>7. Lists<\/h1>\n<p>A list is a special form of a vector that allows multiple elements of different classes at once. It thus serves as a kind of container for other objects, such as numbers, strings, vectors or matrices. A list can be created using the function <span class=\"crayon-inline\">list()<\/span>. Element names can be given to existing lists via the <span class=\"crayon-inline\">names()<\/span> function so that they can later be indexed using these names:<\/p>\n<pre class=\"theme:amityreseda\">\r\nl &lt;- list(13L, &quot;Hello&quot;, matrix(1:6, 2, 3))\r\nl\r\n## [[1]]\r\n## [1] 13\r\n## \r\n## [[2]]\r\n## [1] &quot;Hello&quot;\r\n## \r\n## [[3]]\r\n##      [,1] [,2] [,3]\r\n## [1,]    1    3    5\r\n## [2,]    2    4    6\r\n\r\nnames(l) &lt;- c(&quot;my.integer&quot;, &quot;my.string&quot;, &quot;my.matrix&quot;)\r\nl\r\n## $my.integer\r\n## [1] 13\r\n## \r\n## $my.string\r\n## [1] &quot;Hello&quot;\r\n## \r\n## $my.matrix\r\n##      [,1] [,2] [,3]\r\n## [1,]    1    3    5\r\n## [2,]    2    4    6\r\n\r\nstr(l)\r\n## List of 3\r\n##  $ my.integer: int 13\r\n##  $ my.string: chr &quot;Hello&quot;\r\n##  $ my.matrix: int [1:2, 1:3] 1 2 3 4 5 6\r\n<\/pre>\n<p>When creating a list, however, the element names can also be assigned immediately:<\/p>\n<pre class=\"theme:amityreseda\">\r\nl &lt;- list(\"my.integer\"=13L,\r\n          \"my.string\"=&quot;Hello&quot;,\r\n          \"my.matrix\"=matrix(1:6, 2, 3)\r\n          )\r\n<\/pre>\n<p><strong>Indexing in lists<\/strong><\/p>\n<p>Using the respective index number or the assigned element name (if available), we can use a double square bracket <span class=\"crayon-inline\">[[]]<\/span> to access the contents of the list. Using a simple square bracket, we would only get a part of the list here, which would still belong to class list:<\/p>\n<pre class=\"theme:amityreseda\">\r\nl[1]                        # first part of the list\r\n## $my.integer\r\n## [1] 13\r\nclass(l[1])\r\n## [1] \"list\"\r\n\r\nl[[1]]                      # extract first element (integer value)\r\n## [1] 13\r\nclass(l[[1]])\r\n## [1] \"integer\"\r\n\r\nl[[\"my.string\"]]           # extract element by its name\r\n## [1] \"Hello\"\r\n\r\nl[[3]]                      # extract third element (matrix)\r\n##      [,1] [,2] [,3]\r\n## [1,]    1    3    5\r\n## [2,]    2    4    6\r\n<\/pre>\n<p><strong>Modify lists<\/strong><\/p>\n<p>Lists can be expanded (assign a new index number or new element name to a value), and elements can be deleted (assign <span class=\"crayon-inline\">NULL<\/span>) or overwrite individual list elements (reassign existing index or name):<\/p>\n<pre class=\"theme:amityreseda\">\r\nl[\"my.numeric\"] &lt;- 45.7325          # add new element to list\r\n\r\nl[1] &lt;- NULL                         # delete first element in list\r\n\r\nl[&quot;meinString&quot;] &lt;- &quot;World&quot;           # overwrite existing element\r\n<\/pre>\n<p><a name=\"ch8\"><\/a><\/br><\/p>\n<h1>8. Dataframe<\/h1>\n<p>The data frame is the most commonly used data type when manipulating databases and allows you to manage two-dimensional tabular data. Where is the difference to a matrix? Well, while a matrix can only contain elements of a class, several classes can exist in one data frame. Each column in a data frame is basically a list.<br \/>\nWhenever external data is read into R, a data frame is created.<\/p>\n<pre class=\"theme:amityreseda\">\r\ndf &lt;- data.frame(\r\n  &quot;name&quot;   = c(&quot;Ben&quot;, &quot;Hanna&quot;, &quot;Paul&quot;, &quot;Arthur&quot;), \r\n  &quot;size&quot;   = c(185, 166, 175, 190),\r\n  &quot;weight&quot; = c(110, 60, 76, 89)\r\n  )\r\n\r\ndf\r\n##     name size weight\r\n## 1    Ben  185    110\r\n## 2  Hanna  166     60\r\n## 3   Paul  175     76\r\n## 4 Arthur  190     89\r\n\r\nlength(df)                  # number of columns (variables)\r\n## [1] 3\r\n\r\ndim(df)                     # dimensionen (4 rows, 3 columns)\r\n## [1] 4 3\r\n\r\nnrow(df)                    # number of rows (observations)\r\n## [1] 4\r\n\r\nncol(df)                    # number of columns (variables)\r\n## [1] 3\r\n\r\nstr(df)                     # shows structure of df\r\n## &#039;data.frame&#039;:    4 obs. of  3 variables:\r\n##  $ name  : Factor w\/ 4 levels &quot;Arthur&quot;,&quot;Ben&quot;,..: 2 3 4 1\r\n##  $ size  : num  185 166 175 190\r\n##  $ weight: num  110 60 76 89\r\n\r\nsummary(df)                 # statistical summary\r\n##      name       size            weight\r\n##  Arthur:1   Min.   :166.0   Min.   : 60.00  \r\n##  Ben   :1   1st Qu.:172.8   1st Qu.: 72.00  \r\n##  Hanna :1   Median :180.0   Median : 82.50  \r\n##  Paul  :1   Mean   :179.0   Mean   : 83.75  \r\n##             3rd Qu.:186.2   3rd Qu.: 94.25  \r\n##             Max.   :190.0   Max.   :110.00\r\n<\/pre>\n<p>Interesting is the output of the function <span class=\"crayon-inline\">str()<\/span>. He first shows us that we have 4 observations (obs., Ie &#8220;Ben&#8221;, &#8220;Hanna&#8221;, &#8220;Paul&#8221;, &#8220;Arthur&#8221;) with 3 variables each (variables, ie &#8220;name,&#8221; &#8220;size&#8221;, &#8220;weight&#8221;) Furthermore, for each variable it is determined whether it is numeric (num) or categorial (factor), for the latter the number of different values (w \/ 4 levels) is displayed, and even more useful is the statistical summary for each column of the Data frames via the function <span class=\"crayon-inline\">summary()<\/span>!<\/p>\n<p><strong>Indexing in data frames<\/strong><\/p>\n<p>In a data frame columns can be addressed either by the double square brackets <span class=\"crayon-inline\">[[]]<\/span> by means of index numbers or directly by the name of the column (if available) by means of the dollar sign <span class=\"crayon-inline\">$<\/span>. In addition, the rows or columns can be addressed adequately to a matrix by means of simple square brackets<span class=\"crayon-inline\">[]<\/span>:<\/p>\n<pre class=\"theme:amityreseda\">\r\ndf\r\n##     name size weight\r\n## 1    Ben  185    110\r\n## 2  Hanna  166     60\r\n## 3   Paul  175     76\r\n## 4 Arthur  190     89\r\n\r\ndf[2]                                  # output column 2 as data frame\r\n##   size\r\n## 1  185\r\n## 2  166\r\n## 3  175\r\n## 4  190\r\n\r\ndf[[2]]                                # output as numeric\r\n## [1] 185 166 175 190\r\n\r\ndf$size                                # output as numeric\r\n## [1] 185 166 175 190\r\n\r\ndf[ , 2]                               # column output as numeric\r\n## [1] 185 166 175 190\r\n\r\ndf[1,  ]                               # row output as data frame\r\n##   name size weight\r\n## 1  Ben  185    110\r\n\r\ndf[1, 2]                               # element in row 1, col 2 as numeric\r\n## [1] 185\r\n<\/pre>\n<p>Various queries are also possible, for which we use the boolean operators:<\/p>\n<pre class=\"theme:amityreseda\">\r\ndf\r\n##     name size weight\r\n## 1    Ben  185    110\r\n## 2  Hanna  166     60\r\n## 3   Paul  175     76\r\n## 4 Arthur  190     89\r\n\r\ndf$size &gt; 170\r\n## [1]  TRUE FALSE  TRUE  TRUE\r\n\r\ndf[df$size &gt; 170, ]                     \r\n##     name size weight\r\n## 1    Ben  185    110\r\n## 3   Paul  175     76\r\n## 4 Arthur  190     89\r\n\r\ndf[df$size &gt; 180 &amp; df$weight &lt; 100, ]        # AND condition\r\n##     name size weight\r\n## 4 Arthur  190     89\r\n\r\ndf[df$size &gt; 188 | df$weight &lt; 70, ]         # OR condition\r\n##     name size weight\r\n## 2  Hanna  166     60\r\n## 4 Arthur  190     89\r\n\r\ndf[df$name == &quot;Ben&quot; | df$name == &quot;Hanna&quot;, ]  # OR condition\r\n##    name size weight\r\n## 1   Ben  185    110\r\n## 2 Hanna  166     60\r\n<\/pre>\n<p>Explanation: For queries we use boolean operators. By the query in line 8 we get a boolean Vector, which contains a <span class=\"crayon-inline\">TRUE<\/span> if the respective value of the Observation is greater than 170. We use this vector in line 11 to index the corresponding entries in the data frame (outputs all observations with a <span class=\"crayon-inline\">TRUE<\/span>). When chaining conditions, either both conditions must be fulfilled at the same time by using AND <span class=\"crayon-inline\">&amp;<\/span>, or only one of both by using OR <span class=\"crayon-inline\">|<\/span>.<\/p>\n<p><strong>Modify data frames<\/strong><\/p>\n<p>Often it is necessary to delete data from a data frame or to implement additional entries later. For both tasks there are several possibilities in R. In the following two simple solutions:<\/p>\n<pre class=\"theme:amityreseda\">\r\ndf2 &lt;- df[ , -2]                                # delete column by index\r\ndf2\r\n##     name weight\r\n## 1    Ben    110\r\n## 2  Hanna     60\r\n## 3   Paul     76\r\n## 4 Arthur     89\r\n\r\ndf3 &lt;- subset(df, select = -c(weight, size))    # delete column by name\r\ndf3\r\n##     name\r\n## 1    Ben\r\n## 2  Hanna\r\n## 3   Paul\r\n## 4 Arthur\r\n\r\ndf4 &lt;- df[-3, ]                                 # delet row by index\r\ndf4\r\n##     name size weight\r\n## 1    Ben  185    110\r\n## 2  Hanna  166     60\r\n## 4 Arthur  190     89\r\n\r\ndf5 &lt;- subset(df, !name %in% c(&quot;Ben&quot;, &quot;Hanna&quot;)) # delete row by attribute\r\ndf5\r\n##     name size weight\r\n## 3   Paul  175     76\r\n## 4 Arthur  190     89\r\n<\/pre>\n<p>Excluding columns via the column name is possible via the <span class=\"crayon-inline\">subset()<\/span> function. Here we can use the argument <span class=\"crayon-inline\">-select=<\/span> with a leading minus to specify the name of the column to be deleted (or a vector with <span class=\"crayon-inline\">c()<\/span> for several columns at the same time). The <span class=\"crayon-inline\">!<\/span> symbol is a logical operator and negates a condition (see <span class=\"crayon-inline\">? &#8220;!&#8221;<\/span>).<br \/>\nThe addition of observations and variables is of course also possible:<\/p>\n<pre class=\"theme:amityreseda\">\r\ndf$gender = c(\"m\", \"w\", \"m\", \"m\")         # add a column (variable)\r\ndf\r\n##     name size weight gender\r\n## 1    Ben  185    110      m\r\n## 2  Hanna  166     60      w\r\n## 3   Paul  175     76      m\r\n## 4 Arthur  190     89      m\r\n\r\nnewdata &lt;- data.frame(&quot;name&quot; = &#039;Lisa&#039;,    # add a row (observation)\r\n                      &quot;size&quot; = 180,\r\n                      &quot;weight&quot; = 70,\r\n                      &quot;gender&quot; = &quot;w&quot;\r\n                      )\r\n\r\ndf &lt;- rbind(df, newdata)\r\ndf\r\n##     name size weight gender\r\n## 1    Ben  185    110      m\r\n## 2  Hanna  166     60      w\r\n## 3   Paul  175     76      m\r\n## 4 Arthur  190     89      m\r\n## 5   Lisa  180     70      w\r\n<\/pre>\n<p>If a new line has to be added, the new data must have the same structure as the existing data frame.<\/p>\n<p>Time for training session number IV:<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/blogs.fu-berlin.de\/reseda\/e04\/\"><br \/>\n<button style=\"width:100%;text-align:center;padding: 0;background-color:#6b9e1f;color: white\"><\/p>\n<div style=\"font-family: 'Noto Sans',sans-serif\"><span style=\"font-size: 30px\"><strong>EXERCISE IV<\/strong><\/span><\/div>\n<p><\/button><\/a><\/p>\n<hr style=\"height: 4px;background-color: #6b9e1f\" \/>\n<a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/r-crash-course-5\/\"><br \/>\n<button style=\"width:100%;text-align:right;padding: 10 0;background-color:white;margin:-55px 0 0 0\"><\/p>\n<div style=\"font-family: 'Noto Sans',sans-serif;line-height: 1.2\">\n<span style=\"font-size: 12px;color:#bfbfbf\"><strong><em>NEXT<\/em><\/strong><\/span><br \/>\n<span style=\"font-size: 30px;color:#6b9e1f\"><strong><em>R Crash Course Part V<\/em><\/strong><\/span>\n<\/div>\n<p><\/button><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>7. Lists A list is a special form of a vector that allows multiple elements of different classes at once. It thus serves as a kind of container for other objects, such as numbers, strings, vectors or matrices. A list can be created using the function list(). Element names can be given to existing lists &hellip; <a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/r-crash-course-4\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;R Crash Course Part IV&#8221;<\/span><\/a><\/p>\n","protected":false},"author":3237,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-1458","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages\/1458","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/users\/3237"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/comments?post=1458"}],"version-history":[{"count":5,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages\/1458\/revisions"}],"predecessor-version":[{"id":2474,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages\/1458\/revisions\/2474"}],"wp:attachment":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/media?parent=1458"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}