Feature Selection Based on Minimizing the Area Under the Detection Error Tradeoff Curve

Feature Selection Based on Minimizing the Area Under the Detection Error Tradeoff Curve

Liau Heng Fui (University of Nottingham Malaysia, Malaysia) and Dino Isa (University of Nottingham Malaysia, Malaysia)
DOI: 10.4018/978-1-4666-3628-6.ch002
OnDemand PDF Download:
$37.50

Abstract

Feature selection is crucial to select an “optimized” subset of features from the original feature set based on a certain objective function. In general, feature selection removes redundant or irrelevant data while retaining classification accuracy. This paper proposes a feature selection algorithm that aims to minimize the area under the curve of detection error trade-off (DET) curve. Particle swarm optimization (PSO) is employed to search for the optimal feature subset. The proposed method is implemented in face recognition and iris recognition systems. The result shows that the proposed method is able to find an optimal subset of features that sufficiently describes iris and face images by removing unwanted and redundant features and at the same time improving the classification accuracy in terms of total error rate (TER).
Chapter Preview
Top

1. Introduction

In real world application, the feature set is generally large in term of dimensionality. The features may be noisy and may contain irrelevant or redundant information about the target concept. This may cause performance degradation on classifiers. Besides that, large feature set also increases the storage cost and require more computation time to process. Feature selection is crucial to select an “optimized” subset of features from the original feature set based on certain objective function. In general, feature selection removes redundant or irrelevant data while retaining classification accuracy. Although there has been a great deal of work in different areas to address this issue (Guyon & Elissee, 2003), these results have not been fully explored or exploited in emerging computer vision applications. Only recently, there has been an increased interest in implementing feature selection for applications such as face detection (Sun, Bebis & Miller, 2003; Viola & Jones, 2001), face recognition (Shen & JI, 2009; Kanan & Faez, 2008a, 2008b), auditory brainstem responses tracking (Acir, Ozdamar & Guzelisc, 2005), gene expression data (Chuang et al., 2008) etc.

Sequential forward selection (SFS) and sequential backward selection (SBS) are the two well-known feature selection methods. SFS starts with an empty feature set; the feature that benefits the performance is added iteratively. In contrast, SBS starts with the full feature set and at each step, feature whose absence causes least decreases in terms of classification performance is dropped. Combining SFS and SBS gives birth to the “plus l-take away r” feature selection method (Stearns, 1976), which first enlarges the feature subset by adding l using SFS and then deletes r features using SBS. Sequential forward floating search (SFFS) and sequential backward floating search (SBFS) (Pudil, Novovicova & Kittler, 1994) are generalizations of the Stearn’s method since they make decisions based on single feature iteratively. Hence, they cannot be expected to find globally optimal solutions. Another famous feature selection method is the relief algorithm (Kira & Rendell, 1992) and extension of it (Wiskott et al., 1994). Features are ranked according to the hypothesis margin. Features that give large hypothesis margin are selected.

Recently, evolutionary computing methods, such as genetic algorithms (GA) (Siedlecki & Sklansky, 1989), ant colony optimization (ACO) (Dorigo & Caro, 1999) and particle swarm optimization (PSO) (Kennedy & Eberhart, 1995, 1997) have gained more popularity in the feature selection area. Genetic algorithms (GA’s) are optimization techniques based on the mechanism of natural selection. They try to mimic the operations found in natural genetics to guide itself through the paths in the search space (Siedlecki & Sklansky, 1989). Sun, Bebis, & Miller (2003) employed GA to select a subset of principal components analysis (PCA) features for vehicle and face detection. The fitness evaluation is based on accuracy and number of principle components. The subset that consists of fewer numbers of features and gives comparable result is favored. Kanan & Faez (2008a) proposed a scale, rotation and translation invariant face recognition system using Pseudo Zernike moment invariant (PZMI). PZMI feature set is very large and an optimal subset of PZMI features subset is selected using GAs. ACO algorithm was inspired by ant’s social behavior. Ants are capable of finding the shortest route between a food source and their nest by chemical materials called pheromone that they release when moving. Kanan & Faez (2008b) employed ACO to perform feature selection on a PZMI and discrete wavelet transform (DWT) features that represent facial image. PSO was inspired by the graceful but unpredictable movements of a flock of birds and a school of fishes. Inertia weight was later introduced into the particle swarm optimizer to produce the standard PSO (Shi & Eberhart, 1998). Huang and Dun (2008) employed PSO to performed feature selection and at the same time tuning the parameter of SVM classifier. The binary PSO is incorporated to select the features sets while a common PSO is used to tune SVM's parameter and the kernel's parameter. PSO has been used to perform feature selection for gene expression data (Chuang et al, 2008).

Complete Chapter List

Search this Book:
Reset