The Support Vector Machine (SVM) (Cortes and Vapnik, 1995; Vapnik, 1995; Burges, 1998) is intended to generate an optimal separating hyperplane by minimizing the generalization error without the assumption of class probabilities such as Bayesian classifier. The decision hyperplane of SVM is determined by the most informative data instances, called Support Vectors (SVs). In practice, these SVMs are a subset of the entire training data. By now, SVMs have been successfully applied in many applications, such as face detection, handwritten digit recognition, text classification, and data mining. Osuna et al. (1997) applied SVMs for face detection. Heisele et al. (2004) achieved high face detection rate by using 2nd degree SVM. They applied hierarchical classification and feature reduction methods to speed up face detection using SVMs. Feature extraction and reduction are two primary issues in feature selection that is essential in pattern classification. Whether it is for storage, searching, or classification, the way the data are represented can significantly influence performances. Feature extraction is a process of extracting more effective representation of objects from raw data to achieve high classification rates. For image data, many kinds of features have been used, such as raw pixel values, Principle Component Analysis (PCA), Independent Component Analysis (ICA), wavelet features, Gabor features, and gradient values. Feature reduction is a process of selecting a subset of features with preservation or improvement of classification rates. In general, it intends to speed up the classification process by keeping the most important class-relevant features.
Principal Components Analysis (PCA) is a multivariate procedure which rotates the data such that the maximum variabilities are projected onto the axes. Essentially, a set of correlated variables are transformed into a set of uncorrelated variables which are ordered by reducing the variability. The uncorrelated variables are linear combinations of the original variables, and the last of these variables can be removed with a minimum loss of real data. PCA has been widely used in image representation for dimensionality reduction. To obtain m principal components, a transformation matrix of m × N is multiplied by an input pattern of N × 1. The computation is costly for high dimensional data.
Another well-known method of feature reduction uses Fisher’s criterion to choose a subset of features that possess a large between-class variance and a small within-class variance. For two-class classification problem, the within-class variance for i-th dimension is defined as (1)
is the total number of samples, gj,i
is the i-
th dimensional attribute value of sample j
, and mi
is the mean value of the i-
th dimension for all samples. The Fisher’s score for between-class measurement can be calculated as (2)
By selecting the features with the highest Fisher’s scores, the most discriminative features between class 1 and class 2 are retained.
Weston et al. (2000) developed a feature reduction method for SVMs by minimizing the bounds on the leave-one-out error. Evgenious et al. (2003) introduced a method for feature reduction for SVMs based on the observation that the most important features are the ones that separate the hyperplane the most. Shih and Cheng (2005) proposed an improved feature reduction method in input and feature space for the 2nd degree polynomial SVMs.