{"id":2618,"date":"2018-09-27T15:14:55","date_gmt":"2018-09-27T13:14:55","guid":{"rendered":"https:\/\/blogs.fu-berlin.de\/reseda\/?page_id=2618"},"modified":"2018-10-01T12:34:45","modified_gmt":"2018-10-01T10:34:45","slug":"svm-regression","status":"publish","type":"page","link":"https:\/\/blogs.fu-berlin.de\/reseda\/svm-regression\/","title":{"rendered":"SVM Regression"},"content":{"rendered":"<p>There are several R packages that provide SVM regression, or Support Vector Regression (SVR), support, e.g., <a href=\"https:\/\/cran.r-project.org\/web\/packages\/caret\/caret.pdf\" rel=\"noopener\" target=\"_blank\">caret<\/a>, <a href=\"https:\/\/cran.r-project.org\/web\/packages\/e1071\/e1071.pdf\" rel=\"noopener\" target=\"_blank\">e1071<\/a>, or <a href=\"https:\/\/cran.r-project.org\/web\/packages\/kernlab\/kernlab.pdf\" rel=\"noopener\" target=\"_blank\">kernLab<\/a>. We will use the e1071 package, as it offers <a href=\"https:\/\/cran.r-project.org\/web\/packages\/e1071\/vignettes\/svmdoc.pdf\" rel=\"noopener\" target=\"_blank\">an interface to the well-known libsvm <\/a>implementation.  <\/p>\n<p>Below you can see a complete code implementation. Read the <a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/prepare-samples-for-regression\/\" rel=\"noopener\" target=\"_blank\">previous section<\/a> to learn about pre-processing the training data. While this is already executable with the given input data, you should read the in-depth guide in the following to understand the code in detail.<\/p>\n<pre class=\"theme:amityreseda\">\r\n# import\r\nlibrary(raster)\r\nlibrary(rgeos)\r\nlibrary(e1071)\r\n\r\nsetwd(\"\/media\/sf_exchange\/regression\/\")\r\nimg &lt;- brick(&quot;LC081930232017060201T1-SC20180613160412_subset.tif&quot;)\r\nshp &lt;- shapefile(&quot;reg_train_data.shp&quot;)\r\n\r\n# training sample preprocessing (see last section for details)\r\nnames(img) &lt;- c(&quot;b1&quot;, &quot;b2&quot;, &quot;b3&quot;, &quot;b4&quot;, &quot;b5&quot;, &quot;b6&quot;, &quot;b7&quot;)\r\nshp &lt;- gBuffer(shp, byid=TRUE, width=0)\r\nimg.subset &lt;- crop(img, shp)\r\nimg.mask &lt;- rasterize(shp, img.subset, getCover = TRUE)\r\nimg.mask[img.mask &lt; 100] &lt;- NA\r\nimg.subset &lt;- mask(img.subset, img.mask)\r\ngrid &lt;- rasterToPolygons(img.subset)\r\ngrid$ID &lt;- seq.int(nrow(grid))\r\n\r\nsmp &lt;- data.frame()\r\nfor (i in 1:length(grid)) {\r\n  cell &lt;- intersect(grid[i, ], shp)\r\n  cell &lt;- cell[cell$Class_name == &quot;building&quot; | cell$Class_name == &quot;impervious&quot;, ]\r\n  if (length(cell) &gt; 0) {\r\n    areaPercent &lt;- sum(area(cell) \/ area(grid)[1])\r\n  } else {\r\n    areaPercent &lt;- 0\r\n  }\r\n  newsmp &lt;- cbind(grid@data[i, 1:nlayers(img)], areaPercent)\r\n  smp &lt;- rbind(smp, newsmp)\r\n}\r\n\r\n# model training\r\n# define search ranges\r\ngammas = 2^(-8:3)\r\ncosts = 2^(-5:8)\r\nepsilons = c(0.1, 0.01, 0.001)\r\n# start training via gridsearch\r\nsvmgs &lt;- tune(svm,\r\n              train.x = smp[-ncol(smp)],\r\n              train.y = smp[ncol(smp)],\r\n              type = &quot;eps-regression&quot;,\r\n              kernel = &quot;radial&quot;, \r\n              scale = TRUE,\r\n              ranges = list(gamma = gammas, cost = costs, epsolon = epsilons),\r\n              tunecontrol = tune.control(cross = 5)\r\n              )\r\n\r\n# pick best model\r\nsvrmodel &lt;- svmgs$best.model\r\n\r\n# use best model for prediciton\r\nresult[result &lt; 0] = 1\r\nresult[result &lt; 0] = 0\r\n<\/pre>\n<p><\/br><a name=\"1\"><\/a><\/p>\n<h1><font color=\"003366\">In-depth Guide<\/font><\/h1>\n<p>In order to be able to use the regression functions of the e1071 package, we must additionally load the library into the current session via <span class=\"crayon-inline theme:amityreseda\">library()<\/span>. If you do not use our VM, you must first download and install the packages with <span class=\"crayon-inline theme:amityreseda\">install.packages()<\/span>:<\/p>\n<pre class=\"theme:amityreseda\">\r\n#install.packages(\"raster\")\r\n#install.packages(\"rgeos\")\r\n#install.packages(\"e1071\")\r\nlibrary(raster)\r\nlibrary(rgeos)\r\nlibrary(e1071)\r\n<\/pre>\n<p>First, it is necessary to process the training samples in the form of a data frame. The necessary steps are shown in line 10-31 and described in detail in the <a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/prepare-samples-for-regression\/\" rel=\"noopener\" target=\"_blank\">previous section<\/a>. <\/p>\n<p>After the preprocessing, we can train our Support Vector Regression with the training dataset <span class=\"crayon-inline theme:amityreseda\">smp<\/span>. We will utilize an epsilon Support Vector Regressions, which requires three parameters: one gamma \\(\\gamma\\) value, one cost \\(C\\) value as well as a epsilon \\(\\varepsilon\\) value (for more details refer to the <a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/support-vector-machine\/#2\" rel=\"noopener\" target=\"_blank\">SVM section<\/a>). These hyperparameters significantly determine the performance of the model. Finding the best hyparameters is not trivial and the best combination can not be determined in advance. Thus, we try to find the best combination iteratively by trial and error. Therefore, we create three vectors comprising all values that should be tested:<\/p>\n<pre class=\"theme:amityreseda\">\r\ngammas = 2^(-8:3)\r\ncosts = 2^(-5:8)\r\nepsilons = c(0.1, 0.01, 0.001)\r\n<\/pre>\n<p>So we have 14 different values for \\(\\gamma\\), 14 different values for \\(C\\) and three different values for \\(\\varepsilon\\). Thus, the whole training process is tested for 588 (14 * 14 * 3) models. Conversely, this means that the more parameters we check, the longer the training process takes.<br \/>\nWe start the training with the <span class=\"crayon-inline theme:amityreseda\">tune()<\/span> function. We need to specify the training samples as <span class=\"crayon-inline theme:amityreseda\">train.x<\/span>, i.e., all columns of our <span class=\"crayon-inline theme:amityreseda\">smp<\/span> dataframe except the last one, and the corresponding class labels as <span class=\"crayon-inline theme:amityreseda\">train.y<\/span>, i.e. the last column of our <span class=\"crayon-inline theme:amityreseda\">smp<\/span> dataframe, which holds the <span class=\"crayon-inline theme:amityreseda\">areaPercentage<\/span>of our target class <em>imperviousness<\/em>:<\/p>\n<pre class=\"theme:amityreseda\">\r\nsvmgs &lt;- tune(svm,\r\n              train.x = smp[-ncol(smp)],\r\n              train.y = smp[ncol(smp)],\r\n              type = &quot;eps-regression&quot;,\r\n              kernel = &quot;radial&quot;, \r\n              scale = TRUE,\r\n              ranges = list(gamma = gammas, cost = costs, epsolon = epsilons),\r\n              tunecontrol = tune.control(cross = 5)\r\n              )\r\n<\/pre>\n<p>We have to set the <span class=\"crayon-inline theme:amityreseda\">type<\/span> of the SVM to <span class=\"crayon-inline theme:amityreseda\">&#8220;eps-regression&#8221;<\/span> in line 4 in order to perform a regression task. Furthermore, we set the kernel used in training and predicting to a <a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/support-vector-machine\/\" rel=\"noopener\" target=\"_blank\">RBF kernel<\/a> via <span class=\"crayon-inline theme:amityreseda\">&#8220;radial&#8221;<\/span> (have a look at the <a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/support-vector-machine\/\" rel=\"noopener\" target=\"_blank\">SVM section<\/a> for more details). We set the argument <span class=\"crayon-inline theme:amityreseda\">scale<\/span> to <span class=\"crayon-inline theme:amityreseda\">TRUE<\/span> in order to initiate the <a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/support-vector-machine\/\" rel=\"noopener\" target=\"_blank\">z-transformation of our data<\/a>. The argument <span class=\"crayon-inline theme:amityreseda\">ranges<\/span> in line 7 takes a named list of parameter vectors spanning the sampling range. We put our vectors <span class=\"crayon-inline theme:amityreseda\">gammas<\/span>, <span class=\"crayon-inline theme:amityreseda\">costs<\/span> and <span class=\"crayon-inline theme:amityreseda\">epsilons<\/span> in this list. By using the <span class=\"crayon-inline theme:amityreseda\">tunecontrol<\/span> argument in line 8, you can set k for <a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/support-vector-machine\/\" rel=\"noopener\" target=\"_blank\">the k-fold cross validation on the training data<\/a>, which is necessary to assess the model performance. <\/p>\n<p>Depending on the complexity of the data, this step may take some time. Once completed, you can check the output by calling the resultant object name:<\/p>\n<pre class=\"theme:amityreseda\">\r\nsvmgs \r\n##\r\n## Parameter tuning of \u2018svm\u2019:\r\n##\r\n## - sampling method: 5-fold cross validation \r\n##\r\n## - best parameters:\r\n##       gamma cost epsolon\r\n##  0.00390625    8     0.1\r\n## \r\n## - best performance: 0.03157376 \r\n<\/pre>\n<p>In the course of the cross-validation, the overall accuracies were compared and the best parameters were determined: In our example, those are 0.0039, 8 and 0.1 for \\(\\gamma\\), \\(C\\) and \\(\\varepsilon\\), respectively. Furthermore, the Mean Absolute Error (MAE) of the best model is displayed, which is 0.0316, or 3.12%, in our case.<\/p>\n<p>We can extract the best model out of our <span class=\"crayon-inline theme:amityreseda\">svmgs<\/span> to use for image prediction:<\/p>\n<pre class=\"theme:amityreseda\">\r\nsvrmodel &lt;- svmgs$best.model\r\nsvrmodel \r\n## \r\n## Call:\r\n## best.tune(method = svm, train.x = smp[-ncol(smp)], train.y = smp[ncol(smp)], ranges = list(gamma = gammas, cost = costs, \r\n##     epsolon = epsilons), tunecontrol = tune.control(cross = 5), type = &quot;eps-regression&quot;, kernel = &quot;radial&quot;, scale = TRUE)\r\n## \r\n## \r\n## Parameters:\r\n##    SVM-Type:  eps-regression \r\n##  SVM-Kernel:  radial \r\n##        cost:  8 \r\n##       gamma:  0.00390625 \r\n##     epsilon:  0.1 \r\n## \r\n## \r\n## Number of Support Vectors:  78\r\n<\/pre>\n<p>Save the best model by using the <span class=\"crayon-inline theme:amityreseda\">save()<\/span> function. This function saves the model object <span class=\"crayon-inline theme:amityreseda\">svrmodel<\/span> to your working directory, so that you have it permanently stored on your hard drive. If needed, you can load it any time with <span class=\"crayon-inline theme:amityreseda\">load()<\/span>.<\/p>\n<pre class=\"theme:amityreseda\">\r\nsave(svrmodel, file = \"svrmodel.RData\")\r\n#load(\"svrmodel.RData\")\r\n<\/pre>\n<p>Since your SVR model is now completely trained, you can use it to predict all the pixels in your image. The command method <span class=\"crayon-inline theme:amityreseda\">predict()<\/span> takes a lot of work from you: It is recognized that there is an image which will be processed pixel by pixel. As with the training pixels, each image pixel is now individually regressed and finally reassembled into your final regression image. Save the output as raster object <span class=\"crayon-inline theme:amityreseda\">result<\/span> and have a look at its minimum and maximum values (line 11):<\/p>\n<pre class=\"theme:amityreseda\">\r\nresult &lt;- predict(img, svrmodel)\r\n\r\nresult\r\n## class       : RasterLayer \r\n## dimensions  : 504, 1030, 519120  (nrow, ncol, ncell)\r\n## resolution  : 30, 30  (x, y)\r\n## extent      : 369795, 400695, 5812395, 5827515  (xmin, xmax, ymin, ymax)\r\n## coord. ref. : +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 \r\n## data source : in memory\r\n## names       : layer \r\n## values      : -0.7562021, 1.15675  (min, max)\r\n<\/pre>\n<p>You may will notice some &#8220;super-positive&#8221; (values above 1.0 or 100%) and\/ or some negative (values below 0.0 or 0%) values. Such values are not uncommon, since the SVR implementation of the e1071 package is not subject to any non-negative constraints. You could simply fix this issue by overriding all adequate entries with meaningful minimum and maximum values (0 for 0% and 1 for 100%) by adding the following two lines:<\/p>\n<pre class=\"theme:amityreseda\">\r\nresult[result &gt; 1] = 1\r\nresult[result &lt; 0] = 0\r\n<\/pre>\n<p>Finally, save your regression raster output using the <span class=\"crayon-inline theme:amityreseda\">writeRaster()<\/span> function and plot your result in R:<\/p>\n<pre class=\"theme:amityreseda\">\r\nwriteRaster(result, filename=&quot;regression.tif&quot;)\r\n\r\nplot(result, col=gray.colors(100))\r\n<\/pre>\n<p><a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/09\/reg_010.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/09\/reg_010.png\" alt=\"\" width=\"1192\" height=\"678\" class=\"aligncenter size-full wp-image-2670\" srcset=\"https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/09\/reg_010.png 1192w, https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/09\/reg_010-300x171.png 300w, https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/09\/reg_010-768x437.png 768w, https:\/\/blogs.fu-berlin.de\/reseda\/files\/2018\/09\/reg_010-1024x582.png 1024w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/a><\/p>\n<p>Done! You now got a map, which indicates the percentage of imperviousness, i.e., subpixel-information, for every single pixel in your image data.<\/p>\n<p><\/br><\/br><\/p>\n<hr style=\"height:4px;background-color:#6b9e1f\">\n<a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/validate\/\"><br \/>\n<button style=\"width:100%;text-align:right;padding: 10 0;background-color:white;margin:-55px 0 0 0\"><\/p>\n<div style=\"font-family: 'Noto Sans',sans-serif;line-height: 1.2\">\n<span style=\"font-size: 12px;color:#bfbfbf\"><strong><em>NEXT<\/em><\/strong><\/span><br \/>\n<span style=\"font-size: 30px;color:#6b9e1f\"><strong><em>Validate<\/em><\/strong><\/span>\n<\/div>\n<p><\/button><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are several R packages that provide SVM regression, or Support Vector Regression (SVR), support, e.g., caret, e1071, or kernLab. We will use the e1071 package, as it offers an interface to the well-known libsvm implementation. Below you can see a complete code implementation. Read the previous section to learn about pre-processing the training data. &hellip; <a href=\"https:\/\/blogs.fu-berlin.de\/reseda\/svm-regression\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;SVM Regression&#8221;<\/span><\/a><\/p>\n","protected":false},"author":3237,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2618","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages\/2618","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/users\/3237"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/comments?post=2618"}],"version-history":[{"count":17,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages\/2618\/revisions"}],"predecessor-version":[{"id":2695,"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/pages\/2618\/revisions\/2695"}],"wp:attachment":[{"href":"https:\/\/blogs.fu-berlin.de\/reseda\/wp-json\/wp\/v2\/media?parent=2618"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}