{"id":2173,"date":"2018-07-21T13:10:24","date_gmt":"2018-07-21T11:10:24","guid":{"rendered":"https:\/\/blogs.fu-berlin.de\/reseda\/?page_id=2173"},"modified":"2018-09-25T19:35:31","modified_gmt":"2018-09-25T17:35:31","slug":"create-samples-in-r","status":"publish","type":"page","link":"https:\/\/blogs.fu-berlin.de\/reseda\/create-samples-in-r\/","title":{"rendered":"Create Samples in R"},"content":{"rendered":"<p>Creating validation samples is a crucial step in validation workflows. Certainly, sampling is not a trivial task. Many publications deal with the optimal sampling method, trying to maintain heterogeneity and avoid autcorrelations within validation samples, e.g., <a href=\"https:\/\/www.springer.com\/de\/book\/9783540325710\" rel=\"noopener\" target=\"_blank\">K\u00f6hl et al. 2006<\/a>, <a href=\"http:\/\/www.mdpi.com\/2072-4292\/7\/12\/15817\/htm\" rel=\"noopener\" target=\"_blank\">Mu et al. 2015<\/a>, <a href=\"http:\/\/reddcr.go.cr\/sites\/default\/files\/centro-de-documentacion\/olofsson_et_al._2014_-_good_practices_for_estimating_area_and_assessing_accuracy_of_land_change.pdf\" rel=\"noopener\" target=\"_blank\">Oloffson et al. 2014<\/a>.<\/p>\n<p>In general, there are several random sampling strategies commonly found in remote sensing studies and software solutions:<\/p>\n<ul>\n<li><strong>random<\/strong>: Validation samples were picked completely random. Each pixel within the study area has the same probability of being picked.<\/li>\n<li><strong>stratified random<\/strong>: The proportions of the classes in the validation samples correspond to the area proportion on the classification map.That is, the larger the area of class on the map, the more samples will be drawn from it.<\/li>\n<li><strong>equalized stratified random<\/strong>: Ensures, that each class&#8217;s sample size is exactly the same regardless of area of class on the map<\/li>\n<\/ul>\n<p>We will use <strong>equalized stratified random<\/strong> sampling for this example. This is a complete R script you need in order to automatically generate exactly 50 samples per class within your study extent:<\/p>\n<pre class=\"theme:amityreseda\">\r\n# import package\r\nlibrary(raster)\r\n \r\n# import classification image (last chapter)\r\nsetwd(\"\/media\/sf_exchange\/landsatdata\/\")\r\nimg.classified &lt;- raster(&quot;classification_RF.tif&quot;)\r\n \r\n# create 50 test samples per class\r\nsamplesperclass &lt;- 50\r\nsmp.test &lt;- sampleStratified(classification, size = samplesperclass, na.rm = TRUE, sp = TRUE)\r\n# shuffle test samples\r\nsmp.test &lt;- smp.test[sample(nrow(smp.test)), ]\r\n# delete attributes\r\nsmp.test &lt;- smp.test[ , -c(1, 2)]\r\n# create standard ID attribute\r\nsmp.test$ID &lt;- 1:nrow(smp.test)\r\n \r\n# save test samples as point shapefile\r\nshapefile(smp.test,\r\n          filename = &quot;validation_RFnew.shp&quot;,\r\n          overwrite = TRUE\r\n          )\r\n<\/pre>\n<p><\/br><a name=\"1\"><\/a><\/p>\n<h1>In-depth Guide<\/h1>\n<p>All we need is the <span class=\"crayon-inline theme:amityreseda\">raster<\/span> package, so make sure you&#8217;ve imported it.<\/p>\n<pre class=\"theme:amityreseda\">\r\n#install.packages(\"raster\")\r\nlibrary(raster)\r\n<\/pre>\n<p>Furthermore, you should have your classification map ready (<a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/analyse\/\" rel=\"noopener\" target=\"_blank\">Chapter Analysis<\/a>), lying in your working directory. Import it with the <span class=\"crayon-inline theme:amityreseda\">raster()<\/span> function:<\/p>\n<pre class=\"theme:amityreseda\">\r\nsetwd(\"\/media\/sf_exchange\/landsatdata\/\")\r\nimg.classified &lt;- raster(&quot;classification_RF.tif&quot;)\r\n<\/pre>\n<p>The <span class=\"crayon-inline theme:amityreseda\">raster<\/span> provides a function called <span class=\"crayon-inline theme:amityreseda\">sampleStratified()<\/span>, which does all the work for us:<\/p>\n<pre class=\"theme:amityreseda\">\r\nsmp.test &lt;- sampleStratified(x = classification,\r\n                             size = 50,\r\n                             na.rm = TRUE,\r\n                             sp = TRUE)\r\n<\/pre>\n<p>This function needs an Raster-Object as <span class=\"crayon-inline theme:amityreseda\">x<\/span> argument and a positive integer value as <span class=\"crayon-inline theme:amityreseda\">size<\/span> argument. Latter is the number of sample points per class. Additionally we can exclude all <span class=\"crayon-inline theme:amityreseda\">NA<\/span> values, by setting <span class=\"crayon-inline theme:amityreseda\">na.rm = TRUE<\/span>. <span class=\"crayon-inline theme:amityreseda\">NA<\/span> Values can arise if your scene lies diagonally in space and points are placed in the border areas. Line 4 ensures, that the returned object is a SpatialPointDataFrame (which is easier to handle).<\/p>\n<p>We can now check the class labels of our newly extracted validation points:<\/p>\n<pre class=\"theme:amityreseda\">\r\nsmp.test$classification_RF\r\n##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\r\n##  [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2\r\n##  [71] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3\r\n## [106] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3\r\n## [141] 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4\r\n## [176] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5\r\n## [211] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5\r\n## [246] 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6\r\n## [281] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6\r\n<\/pre>\n<p>Note: the name of the column within <span class=\"crayon-inline theme:amityreseda\">smp.test<\/span> depends on the file name of your classification map.<\/p>\n<p>Due to the subsequent manual labeling in the <a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/label-samples-in-qgis\/\" rel=\"noopener\" target=\"_blank\">next section<\/a>, it makes sense to mix the samples so that the order is random. Again, we use the <span class=\"crayon-inline theme:amityreseda\">sample()<\/span> function for this:<\/p>\n<pre class=\"theme:amityreseda\">\r\nsmp.test &lt;- smp.test[sample(nrow(smp.test)), ]\r\n<\/pre>\n<p>By looking at the class labels again, we see that the order is now random:<\/p>\n<pre class=\"theme:amityreseda\">\r\nsmp.test$classification_RF\r\n##   [1] 3 1 4 4 6 4 5 3 6 2 1 5 4 2 2 4 4 3 4 5 1 3 4 2 6 3 3 6 1 4 6 6 5 6 2\r\n##  [36] 3 1 1 3 3 1 2 5 4 3 4 2 4 1 5 1 1 4 1 3 5 2 1 5 4 5 2 3 6 2 1 5 3 5 2\r\n##  [71] 2 6 4 1 5 1 6 4 6 6 6 6 1 1 1 2 4 5 6 4 5 3 5 4 2 2 1 2 2 3 5 6 6 1 1\r\n## [106] 2 1 2 1 1 3 4 6 1 6 6 5 4 6 2 1 6 6 5 1 5 2 3 6 4 2 6 6 3 4 6 4 2 6 3\r\n## [141] 2 2 2 3 2 6 6 5 3 3 4 2 6 1 2 5 1 3 6 1 3 6 1 3 4 2 2 5 4 1 1 6 5 4 4\r\n## [176] 4 2 5 1 6 1 5 5 6 4 5 2 2 2 5 3 2 6 2 5 3 6 4 1 5 5 3 1 3 5 4 4 4 4 2\r\n## [211] 3 5 3 4 1 1 3 3 4 2 2 3 3 1 4 4 6 2 1 1 2 6 3 4 5 3 6 1 6 5 2 3 6 3 6\r\n## [246] 5 1 1 4 1 2 4 3 4 5 5 3 5 2 3 2 2 6 3 6 5 5 3 2 4 3 1 3 6 5 6 5 5 5 6\r\n## [281] 2 3 5 5 1 1 3 4 4 1 4 5 6 4 2 3 5 3 5 4\r\n<\/pre>\n<p>In addition, we can delete all variables in our dataframe <span class=\"crayon-inline theme:amityreseda\">smp.test<\/span> and append a consecutive ID variable called <span class=\"crayon-inline theme:amityreseda\">ID<\/span>, which will then be displayed to us in QGIS:<\/p>\n<pre class=\"theme:amityreseda\">\r\nsmp.test &lt;- smp.test[, -c(1, 2)]\r\nsmp.test$ID &lt;- 1:nrow(smp.test)\r\n\r\nsmp.test\r\n## class       : SpatialPointsDataFrame \r\n## features    : 300 \r\n## extent      : 369870, 400650, 5812410, 5827500  (xmin, xmax, ymin, ymax)\r\n## coord. ref. : +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 \r\n## variables   : 1\r\n## names       :  ID \r\n## min values  :   1 \r\n## max values  : 300\r\n<\/pre>\n<p>To visualize the distribution of our validation points, we can plot the SpatialPointDataFrame <span class=\"crayon-inline theme:amityreseda\">smp.test<\/span> on top of our classification map in one plot:<\/p>\n<pre class=\"theme:amityreseda\">\r\nplot(classification, \r\n     axes = FALSE, \r\n     box = FALSE,\r\n     col = c(\"#fbf793\", \"#006601\", \"#bfe578\", \"#d00000\", \"#fa6700\", \"#6569ff\")\r\n     )\r\n\r\npoints(smp.test)\r\n<\/pre>\n<p><a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/07\/valid_009.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/07\/valid_009.png\" alt=\"\" width=\"1096\" height=\"487\" class=\"aligncenter size-full wp-image-2251\" srcset=\"https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/07\/valid_009.png 1096w, https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/07\/valid_009-300x133.png 300w, https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/07\/valid_009-768x341.png 768w, https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/07\/valid_009-1024x455.png 1024w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/a><\/p>\n<p>Last but not least, it is still necessary to save the SpatialPointDataFrame <span class=\"crayon-inline theme:amityreseda\">smp.test<\/span> as a shapefile to your hard drive. This is also very easy with the <span class=\"crayon-inline theme:amityreseda\">shapefile()<\/span> function from the raster package. Choose an appropriate <span class=\"crayon-inline theme:amityreseda\">filename = <\/span> for the new shapefile created.<\/p>\n<pre class=\"theme:amityreseda\">\r\nshapefile(smp.test,\r\n          filename = \"validation_RF.shp\",\r\n          overwrite = TRUE\r\n          )\r\n<\/pre>\n<p><\/br><\/br><\/p>\n<hr style=\"height:4px;background-color:#6b9e1f\">\n<a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/label-samples-in-qgis\/\"><br \/>\n<button style=\"width:100%;text-align:right;padding: 10 0;background-color:white;margin:-55px 0 0 0\"><\/p>\n<div style=\"font-family: 'Noto Sans',sans-serif;line-height: 1.2\">\n<span style=\"font-size: 12px;color:#bfbfbf\"><strong><em>NEXT<\/em><\/strong><\/span><br \/>\n<span style=\"font-size: 30px;color:#6b9e1f\"><strong><em>Label Samples in QGIS<\/em><\/strong><\/span>\n<\/div>\n<p><\/button><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Creating validation samples is a crucial step in validation workflows. Certainly, sampling is not a trivial task. Many publications deal with the optimal sampling method, trying to maintain heterogeneity and avoid autcorrelations within validation samples, e.g., K\u00f6hl et al. 2006, Mu et al. 2015, Oloffson et al. 2014. In general, there are several random sampling &hellip; <a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/create-samples-in-r\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Create Samples in R&#8221;<\/span><\/a><\/p>\n","protected":false},"author":3237,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2173","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages\/2173","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/users\/3237"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/comments?post=2173"}],"version-history":[{"count":25,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages\/2173\/revisions"}],"predecessor-version":[{"id":2878,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages\/2173\/revisions\/2878"}],"wp:attachment":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/media?parent=2173"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}