Validate

By now, you have generated your classification map, great! But can you rely on the map’s information? To underpin the meaningfulness of your results, a validation is needed.
The validation of remote sensing data is the last step in our workflow. The purpose of this chapter is to describe standard and advanced methods for validating a classification map.

Chapter in a Box

In this chapter, the following content awaits you:

Validation Intro
– training, testing, validation – use the right terminology
– best validation practice for remote sensing
Create Samples in R
– stratified random sampling in R
– generation and export of point coordinates as shapefile for usage in QGIS
Label Samples in QGIS
– import of point shapefile
– label points according to their class membership
– use Landsat and very high resolution basemaps as validation basis for labeling
– save labeled point shapefile
Accuracy Statistics in R
– generate a complete accuracy matrix in R
– calculate confidence intervalls for overall accuracies
– calculate kappa statistics
Area Adjusted Accuracies
– calculate area weighted accuracy statistics according to Olofsson et al. 2014

Validation Intro

Training dataset: A model is initially fit on a training sample dataset. The model iteratively learn from those training samples and tries to map data \(x\) to output response \(y\).

Testing dataset: During training, algorithms often use a testing dataset for an unbiased evaluation of a model fit while tuning the model’s hyperparameter, e.g., \(mtry\) for RF, or \(\gamma\) and \(C\) for SVM. The testing dataset is generated internally, e.g., in the form of OOB samples in RF, or cross validation in SVM.

Validation dataset: Finally, a validation dataset is completely independent from the other two datasets and provides an unbiased evaluation of a model fit.

All right, so what is the best validation practice for remote sensing studies?

  1. automatically create multiple point coordinates all over your study area or your classification extent
  2. manually attribute the corresponding class labels to all of those point coordinates (labeling)
  3. statistically examine the deviations and matches between the manually assigned class labels and the labels assigned by the classificator at any given point coordinates

In the following, we want to present a best practice workflow for a classification in detail.