Impact of Training Set Size on Object-Based Land Cover Classification: A Comparison of Three Classifiers

Impact of Training Set Size on Object-Based Land Cover Classification: A Comparison of Three Classifiers

Gerhard Myburgh (Department of Geography and Environmental Studies, Stellenbosch University, Stellenbosch, South Africa) and Adriaan van Niekerk (Department of Geography and Environmental Studies, Stellenbosch University, Stellenbosch, South Africa)
Copyright: © 2014 |Pages: 19
DOI: 10.4018/ijagr.2014070104
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Supervised classifiers are commonly employed in remote sensing to extract land cover information, but various factors affect their accuracy. The number of available training samples, in particular, is known to have a significant impact on classification accuracies. Obtaining a sufficient number of samples is, however, not always practical. The support vector machine (SVM) is a supervised classifier known to perform well with limited training samples and has been compared favourably to other classifiers for various problems in pixel-based land cover classification. Very little research on training-sample size and classifier performance has been done in a geographical object-based image analysis (GEOBIA) environment. This paper compares the performance of SVM, nearest neighbour (NN) and maximum likelihood (ML) classifiers in a GEOBIA environment, with a focus on the influence of training-set size. Training-set sizes ranging from 4-20 per land cover class were tested. Classification tree analysis (CTA) was used for feature selection. The results indicate that the performance of all the classifiers improved significantly as the size of the training set increased. The ML classifier performed poorly when few (<10 per class) training samples were used and the NN classifier performed poorly compared to SVM throughout the experiment. SVM was the superior classifier for all training-set sizes although ML achieved competitive results for sets of 12 or more training areas per class.
Article Preview

Introduction

Detailed, accurate and up-to-date land cover information is critical for environmental and socio-economic research (Heinl et al., 2009; Lu & Weng, 2007). A large number of satellite platforms are operational that have the capability to provide remotely sensed imagery at various spatial and temporal scales (Foody, 2002). This abundance of available data offers great potential for generating frequently updated thematic maps as remotely sensed images cover large areas, are acquired at regular intervals and are less costly than traditional ground-survey methods (Foody, 2009; Gao, 2009; Pal & Mather, 2004; Szuster et al., 2011). Current image-processing techniques are, however, limited in their ability to extract accurate land cover features automatically (Baraldi et al., 2010). Many factors also affect the accuracy of image classification (Lu & Weng, 2007) and the quality of many land cover maps is often perceived as being insufficient for operational use (Foody, 2002).

Supervised classification, an approach commonly used for the classification of remote sensing images, requires samples of known identity (training samples) to construct a model capable of classifying unknown samples. Apart from selecting a suitable classifier, the number and quality of training samples are key to a successful classification (Hubert-Moy et al., 2001; Lillesand et al., 2008; Lu & Weng, 2007). A sufficient number of training samples is generally required to perform a successful classification and the samples need to be well distributed and sufficiently representative of the land cover classes being evaluated (Campbell, 2006; Gao, 2009; Lu & Weng, 2007; Mather, 2004). In remote sensing applications, the availability of labelled training samples is often limited (Gehler & Schölkopf, 2009; Mountrakis et al., 2011) as their collection is time-consuming, expensive and tedious, often requiring the study of maps and aerial photographs and carrying out field visits (Campbell, 2006).

Support vector machines (SVM) have been shown to improve the reliability and accuracy of supervised classifications (Oommen et al., 2008). SVM are known for their good generalizing ability even when few training samples are available and it has been suggested that SVM produce superior results compared to other statistical classifiers when fewer training samples are available (Foody & Mathur, 2004b; Li et al., 2010; Lizarazo, 2008; Mountrakis et al., 2011; Pal & Mather, 2005).

The introduction of SVM to remote sensing has led to a number of comparative studies involving SVM and other classifiers of land cover (Camps-Valls et al., 2004; Camps-Valls & Bruzzone, 2005; Dixon & Candade, 2008; Foody & Mathur 2004a; Gualtieri & Cromp, 1998; Huang et al., 2002; Kavzoglu & Colkesen, 2009; Keuchel et al., 2003; Melgani & Bruzzone, 2002; 2004; Mercier & Lennon, 2003; Oommen et al., 2008; Pal & Mather 2004; 2005; Szuster et al., 2011; Tzotsos & Argialas, 2008). Although the results of such studies depend on the data and classification scheme used in each case, it was generally found that SVM produced either superior or equivalent classification accuracies when compared with methods such as maximum likelihood (ML), nearest neighbour (NN), artificial neural networks (ANN) and decision trees.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 9: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing