Accuracy Statistics in R

In this section we will focus on creating an confusion matrix in R. Additionally we will perform a significance test, and calculate confidence intervals as well as the kappa coefficient.


Accuracy Matrix

A confusion matrix, also known as error or accuracy matrix, is a specific type of table showing the performance of an algorithm. The name confusion matrix comes from the fact that you can quickly see where the algorithm confuses two classes, which would indicate a misclassification.

Several metrics can be derived from the table:

  • User’s accuracies: Depict how often the class in the classification map will actually be present on the ground. The User’s Accuracy is complement of the Commission Error (User’s Accuracy = 100% – Commission Error)
  • Producer’s accuracies: Depict how often real features in the study area are correctly shown on the classication map. The Producer’s Accuracy is complement of the Omission Error (Producer’s Accuracy = 100% – Omission Error)
  • Overall accuracy: Characterize the proportion of all correctly classified validation points in percent.

Implementation in R::
You need your final classification map, as well as both the training and validation shapefiles created in the course of the Classification in R section.
Good to go?

Fine, then import your raster library in R as usual:

#install.packages("raster")
library(raster)

Then import your classification image classification_RF.tif, your training shapefile training_data.shp and your validation shapefile validation_RF.shp. In this example we use the output of the RF classification. However, the workflow is applicable to every classifier!

setwd("/media/sf_exchange/landsatdata/")
img.classified <- raster("classification_RF.tif")
shp.train <- shapefile("training_data.shp")
shp.valid <- shapefile("validation_RF.shp")

Our goal is first to generate two factor vectors, which we then compare in our confusion matrix:

  1. reference : class labels you assigned manually in the previous section
  2. predicted : class labels that resulted in the automatic RF (or SVM) classification

For the reference vector, we need to address the validclass column of our shapefile (we created this column in this section in QGIS). As already mentioned, we need to transform the vectors into factors using as.factor() (it becomes clear why in a second):

reference <- as.factor(shp.valid$validclass)
reference
##   [1] 3    1    4    4    6    4    5    3    6    2    1    5    4    2   
##  [15] 2    4    4    5    4    5    1    3    <NA> 2    6    3    3    6   
##  [29] 1    4    6    6    5    6    2    3    1    1    3    2    4    2   
##  [43] 5    4    2    5    2    4    5    5    3    1    4    1    3    5   
##  [57] 2    3    5    4    5    2    5    6    2    1    5    2    5    2   
##  [71] 2    6    4    1    5    3    6    4    6    6    6    6    1    5   
##  [85] 1    2    5    5    6    4    5    3    5    5    2    2    5    6   
##  [99] 2    3    5    6    6    1    1    2    1    2    1    <NA> 3    4   
## [113] 6    1    6    6    5    5    6    2    3    6    6    5    3    5   
## [127] 2    3    6    4    3    6    6    5    4    6    4    2    6    3   
## [141] 2    2    2    3    2    6    6    5    3    3    4    2    6    <NA>
## [155] 2    5    <NA> 3    6    1    3    6    1    2    5    2    2    5   
## [169] 5    <NA> 4    6    5    5    4    4    2    5    1    6    3    5   
## [183] 5    <NA> 5    5    2    2    2    5    3    2    6    2    5    3   
## [197] 6    4    1    3    5    3    3    3    5    4    4    4    4    2   
## [211] 3    5    2    4    1    1    3    2    4    2    2    3    3    5   
## [225] 4    4    6    2    1    1    2    6    3    4    5    3    6    1   
## [239] 6    5    2    2    6    3    6    5    5    1    4    1    2    5   
## [253] 3    4    5    5    3    5    2    3    2    2    6    3    6    5   
## [267] 5    5    3    4    3    <NA> 3    6    5    6    5    5    5    6   
## [281] 2    3    4    5    1    1    3    4    4    1    4    5    6    4   
## [295] 2    3    4    3    5    4   
## Levels: 1 2 3 4 5 6

You may notice some NA entries in your reference values. These arise, for example, if you have skipped some validation points during the labeling in QGIS because you could not make a clear classification based on the available image data. We simply ignore them later on in the statistics – no problem!

For the predicted vector, we need to extract the classification values at the validation coordinates. Here again, we want a factor object via as.factor():

predicted <- as.factor(extract(img.classified, shp.valid))
predicted
##   [1] 3 1 4 4 6 4 5 3 6 2 1 5 4 2 2 4 4 3 4 5 1 3 4 2 6 3 3 6 1 4 6 6 5 6 2
##  [36] 3 1 1 3 3 1 2 5 4 3 4 2 4 1 5 1 1 4 1 3 5 2 1 5 4 5 2 3 6 2 1 5 3 5 2
##  [71] 2 6 4 1 5 1 6 4 6 6 6 6 1 1 1 2 4 5 6 4 5 3 5 4 2 2 1 2 2 3 5 6 6 1 1
## [106] 2 1 2 1 1 3 4 6 1 6 6 5 4 6 2 1 6 6 5 1 5 2 3 6 4 2 6 6 3 4 6 4 2 6 3
## [141] 2 2 2 3 2 6 6 5 3 3 4 2 6 1 2 5 1 3 6 1 3 6 1 3 4 2 2 5 4 1 1 6 5 4 4
## [176] 4 2 5 1 6 1 5 5 6 4 5 2 2 2 5 3 2 6 2 5 3 6 4 1 5 5 3 1 3 5 4 4 4 4 2
## [211] 3 5 3 4 1 1 3 3 4 2 2 3 3 1 4 4 6 2 1 1 2 6 3 4 5 3 6 1 6 5 2 3 6 3 6
## [246] 5 1 1 4 1 2 4 3 4 5 5 3 5 2 3 2 2 6 3 6 5 5 3 2 4 3 1 3 6 5 6 5 5 5 6
## [281] 2 3 5 5 1 1 3 4 4 1 4 5 6 4 2 3 5 3 5 4
## Levels: 1 2 3 4 5 6

Once we prepared both factor vectors, we can utilize the table function in an elegant way to build a contingency table of the counts at each combination of factor levels. We can additionally name the vectors so that they are also displayed in the table accordingly:

accmat <- table("pred" = predicted, "ref" = reference)
accmat
##     ref
## pred  1  2  3  4  5  6
##    1 31  0  7  2  5  0
##    2  0 47  2  0  0  1
##    3  0  7 39  0  4  0
##    4  0  0  0 40  9  0
##    5  0  0  1  2 47  0
##    6  0  0  0  0  0 49

The numbers thus reflect the number of validation pixels. All pixels that have a NA value in either reference or predicted were ignored here.
Excellent! This output already visualize if and where there are misclassifications in our map: all pixels located on the diagonale are correctly classified, all pixels off the diagonal are not.

We can now calculate the user’s accuracies UA, producer’s accuracies PA, and the overall accuracy OA:

UA <- diag(accmat) / rowSums(accmat) * 100
UA
##         1         2         3         4         5         6 
##  68.88889  94.00000  78.00000  81.63265  94.00000 100.00000

PA <- diag(accmat) / colSums(accmat) * 100
PA
##         1         2         3         4         5         6 
## 100.00000  87.03704  79.59184  90.90909  72.30769  98.00000

OA <- sum(diag(accmat)) / sum(accmat) * 100
OA
## [1] 86.34812

Actually we already extracted all information needed for a confusion matrix, so let us form a nicely formatted matrix in R:

accmat.ext <- addmargins(accmat)
accmat.ext <- rbind(accmat.ext, "Users" = c(PA, NA))
accmat.ext <- cbind(accmat.ext, "Producers" = c(UA, NA, OA))
colnames(accmat.ext) <- c(levels(as.factor(shp.train$classes)), "Sum", "PA")
rownames(accmat.ext) <- c(levels(as.factor(shp.train$classes)), "Sum", "UA")
accmat.ext <- round(accmat.ext, digits = 1)
dimnames(accmat.ext) <- list("Prediction" = colnames(accmat.ext),
                             "Reference" = rownames(accmat.ext))
class(accmat.ext) <- "table"
accmat.ext
##            Reference
## Prediction  baresoil forest grassland urban_hd urban_ld water   Sum    UA
##   baresoil      31.0    0.0       7.0      2.0      5.0   0.0  45.0  68.9
##   forest         0.0   47.0       2.0      0.0      0.0   1.0  50.0  94.0
##   grassland      0.0    7.0      39.0      0.0      4.0   0.0  50.0  78.0
##   urban_hd       0.0    0.0       0.0     40.0      9.0   0.0  49.0  81.6
##   urban_ld       0.0    0.0       1.0      2.0     47.0   0.0  50.0  94.0
##   water          0.0    0.0       0.0      0.0      0.0  49.0  49.0 100.0
##   Sum           31.0   54.0      49.0     44.0     65.0  50.0 293.0      
##   PA           100.0   87.0      79.6     90.9     72.3  98.0        86.3


Significance Test

Furthermore, we can check if the result is purely coincidental, i.e., whether a random classification of the classes could have led to an identical result. We can use a binomial test for this. We only need two values for this test:
x = total number of correctly classified validation points, and
n = the total number of validation points in our confusion matrix:

sign <- binom.test(x = sum(diag(accmat)),
                   n = sum(accmat),
                   alternative = c("two.sided"),
                   conf.level = 0.95
                   )

pvalue <- sign$p.value
pvalue
## [1] 5.30128e-39

CI95 <- sign$conf.int[1:2]
CI95
## [1] 0.8187710 0.9006459

The p-value is much lower than 0.05, so the classification result is highly significant. If the classification were repeated under the same conditions, it can be assumed that the OA is 95% in the range of 81.2% to 90.0%.


Kappa

The Kappa Coefficient can be used to evaluate the accuracy of a classification. It evaluates how well the classification performs compared to map, in which all values are just randomly assigned.
The Kappa coefficient can range from -1 to 1.
A value of 0 indicates that the classification is as good as random values.
A value below 0 indicates the classification is significantly worse than random.
A value greater than 0 indicates that the classification is significantly better than random.

When you have the accuracy matrix as a table \(m_{i, j}\) with \(c\) different classes, then Kappa is

\begin{equation} \label{1}\tag{1}
\kappa = \frac{N_{o} – N_{e}}{N – N_{e}} , \text{with} \\
N = \sum_{i,j = 1}^{c} m_{i, j}\\
N_{o} = \sum_{i=1}^c{m_{i, i}}\\
N_{e} = \frac{1}{N}\cdot\sum_{l=1}^c\left(\sum_{j=1}^c{m_{l, j}} \cdot \sum_{i=1}^c{m_{i, l}}\right)
\end{equation}

That looks pretty complicated. Using R, we can write our own function, which calculates \(\kappa\) for us! The calculation also looks much friendlier:

kappa <- function(m) {
  N <- sum(m)
  No <- sum(diag(m))
  Ne <- 1 / N * sum(colSums(m) * rowSums(m))
  return( (No - Ne) / (N - Ne) )
}

Use the accuracy matrix accmat as argument for our new function to calculate the Kappa coefficient:

kappacoefficient <- kappa(accmat)
kappacoefficient
## [1] 0.8359646

However, Kappas use has been questioned by many articles and is therefore not recommended (see Pontius Jr and Millones 2011).