The typical recognition/classification framework in Artificial Vision uses a set of object features for discrimination. Features can be either numerical measures or nominal values. Once obtained, these feature values are used to classify the object. The output of the classification is a label for the object (Mitchell, 1997). The classifier is usually built from a set of “training” samples. This is a set of examples that comprise feature values and their corresponding labels. Once trained, the classifier can produce labels for new samples that are not in the training set. Obviously, the extracted features must be discriminative. Finding a good set of features, however, may not be an easy task. Consider for example, the face recognition problem: recognize a person using the image of his/her face. This is currently a hot topic of research within the Artificial Vision community, see the surveys (Chellappa et al, 1995), (Samal & Iyengar, 1992) and (Chellappa & Zhao, 2005). In this problem, the available features are all of the pixels in the image. However, only a number of these pixels are normally useful for discrimination. Some pixels are background, hair, shoulders, etc. Even inside the head zone of the image some pixels are less useful than others. The eye zone, for example, is known to be more informative than the forehead or cheeks (Wallraven et al, 2005). This means that some features (pixels) may actually increase recognition error, for they may confuse the classifier. Apart from performance, from a computational cost point of view it is desirable to use a minimum number of features. If fed with a large number of features, the classifier will take too long to train or classify.
Feature Selection aims at identifying the most informative features. Once we have a measure of “informativeness” for each feature, a subset of them can be used for classifying. In this case, the features remain the same, only a selection is made. The topic of feature selection has been extensively studied within the Machine Learning community (Duda et al, 2000). Alternatively, in Feature Extraction a new set of features is created from the original set. In both cases the objective is both reducing the number of available features and using the most discriminative ones.
The following sections describe two techniques for Feature Extraction: Principal Component Analysis and Independent Component Analysis. Linear Discriminant Analysis (LDA) is a similar dimensionality reduction technique that will not be covered here for space reasons, we refer the reader to the classical text (Duda et al., 2000).
As an example problem we will consider face recognition. The face recognition problem is particularly interesting here because of a number of reasons. First, it is a topic of increasingly active research in Artificial Vision, with potential applications in many domains. Second, it has images as input, see Figure 1 from the Yale Face Database (Belhumeur et al, 1997), which means that some kind of feature processing/selection must be done previous to classification.Top
Principal Component Analysis
Principal Component Analysis (PCA), see (Turk & Pentland, 1991), is an orthogonal linear transformation of the input feature space. PCA transforms the data to a new coordinate system in which the data variances in the new dimensions is maximized. Figure 2 shows a 2-class set of samples in a 2-feature space. These data have a certain variance along the horizontal and vertical axes. PCA maps the samples to a new orthogonal coordinate system, shown in bold, in which the sample variances are maximized. The new coordinate system is centered on the data mean.
Key Terms in this Chapter
Classifier: Algorithm that produces class labels as output, from a set of features of an object. A classifier, for example, is used to classify certain features extracted from a face image and provide a label (an identity of the individual).
Independent Component Analysis: Feature extraction technique in which the statistical independence of the data is maximized.
Feature Extraction: The process by which a new set of discriminative features is obtained from those available. Classification is performed using the new set of features.
Face Recognition: The AV problem of recognizing an individual from one or more images of his/her face.
Feature Selection: The process by which a subset of the available features (usually the most discriminative ones) is selected for classification.
Principal Component Analysis: Feature extraction technique in which the variance of the data is maximized. It provides a new feature space in which the dimensions are ordered by sample correlation. Thus, a subset of these dimensions can be chosen in which samples are minimally correlated.
Eigenface: A basis vector of the PCA transform, when applied to face images.