Support Vector Machines (SVM) are powerful tools for classification of data. This article describes the functionality of SVM including their design and operation. SVM have been shown to provide high classification accuracies and have good generalization capabilities. SVM can classify linearly separable data as well as nonlinearly separable data through the use of the kernel function. The advantages of using SVM are discussed along with the standard types of kernel functions. Furthermore, the effectiveness of applying SVM to large, spatial datasets derived from Geographic Information Systems (GIS) is also described. Future trends and applications are also discussed – the described extracted dataset contains seven independent variables related to urban development plus a class label which denotes the urban areas versus the rural areas. This large dataset, with over a million instances really proves the generalization capabilities of the SVM methods. Also, the spatial property allows experts to analyze the error signal.
The Svm Approach And Critical Issues
The SVM algorithms of Cristianini and Shawe-Taylor (2000) and Schölkopf and Smola (2002) inspired from the statistical learning theory (Vapnik, 1999) combine together kernel methodology and convex function optimization to solve classification and regression problems. With numerous advantages (Table 1) and several available computer implementations (Table 2), SVM has become a viable and efficient tool.
Key Terms in this Chapter
Maximal Margin Hyperplane: A hyperplane, which separates two clouds of points and is at equal distance from the two. The margin between the hyperplane and the clouds is thus maximal.
Linearly Separable Data: Two sets of data points in a two dimensional space are said to be linearly separable when they can be completely separable by a single straight line. In general, two groups of data points are separable in a n-dimensional space if they can be separated by an n-1 dimensional hyperplane.
Support Vector Machines: A supervised learning technique based on statistical learning theory.
Support Vectors: A subset of the training examples that are closest to the maximum margin hyperplane.
Kernel: The similarity measure between different members of a data set that is used together with a mapping function F to translate the initial dataset into a higher dimensional space
Spatial Data: Geographic information related to the location and distribution of objects on the Earth.
Noise: The random component of a measurement error.
Regression: The task of learning a target function that maps one or more input attributes to a single numeric attribute.