Wrapper Feature Selection

Wrapper Feature Selection

Kyriacos Chrysostomou (Brunel University, UK)
Copyright: © 2009 |Pages: 6
DOI: 10.4018/978-1-60566-010-3.ch322
OnDemand PDF Download:
$37.50

Abstract

It is well known that the performance of most data mining algorithms can be deteriorated by features that do not add any value to learning tasks. Feature selection can be used to limit the effects of such features by seeking only the relevant subset from the original features (de Souza et al., 2006). This subset of the relevant features is discovered by removing those that are considered as irrelevant or redundant. By reducing the number of features in this way, the time taken to perform classification is significantly reduced; the reduced dataset is easier to handle as fewer training instances are needed (because fewer features are present), subsequently resulting in simpler classifiers which are often more accurate. Due to the abovementioned benefits, feature selection has been widely applied to reduce the number of features in many data mining applications where data have hundreds or even thousands of features. A large number of approaches exist for performing feature selection including filters (Kira & Rendell, 1992), wrappers (Kohavi & John, 1997), and embedded methods (Quinlan, 1993). Among these approaches, the wrapper appears to be the most popularly used approach. Wrappers have proven popular in many research areas, including Bioinformatics (Ni & Liu, 2004), image classification (Puig & Garcia, 2006) and web page classification (Piramuthu, 2003). One of the reasons for the popularity of wrappers is that they make use of a classifier to help in the selection of the most relevant feature subset (John et al., 1994). On the other hand, the remaining methods, especially filters, evaluate the merit of a feature subset based on the characteristics of the data and statistical measures, e.g., chi-square, rather than the classifiers intended for use (Huang et al., 2007). Discarding the classifier when performing feature selection can subsequently result in poor classification performance. This is because the relevant feature subset will not reflect the classifier’s specific characteristics. In this way, the resulting subset may not contain those features that are most relevant to the classifier and learning task. The wrapper is therefore superior to other feature selection methods like filters since it finds feature subsets that are more suited to the data mining problem.
Chapter Preview
Top

Introduction

It is well known that the performance of most data mining algorithms can be deteriorated by features that do not add any value to learning tasks. Feature selection can be used to limit the effects of such features by seeking only the relevant subset from the original features (de Souza et al., 2006). This subset of the relevant features is discovered by removing those that are considered as irrelevant or redundant. By reducing the number of features in this way, the time taken to perform classification is significantly reduced; the reduced dataset is easier to handle as fewer training instances are needed (because fewer features are present), subsequently resulting in simpler classifiers which are often more accurate.

Due to the abovementioned benefits, feature selection has been widely applied to reduce the number of features in many data mining applications where data have hundreds or even thousands of features. A large number of approaches exist for performing feature selection including filters (Kira & Rendell, 1992), wrappers (Kohavi & John, 1997), and embedded methods (Quinlan, 1993). Among these approaches, the wrapper appears to be the most popularly used approach. Wrappers have proven popular in many research areas, including Bioinformatics (Ni & Liu, 2004), image classification (Puig & Garcia, 2006) and web page classification (Piramuthu, 2003). One of the reasons for the popularity of wrappers is that they make use of a classifier to help in the selection of the most relevant feature subset (John et al., 1994). On the other hand, the remaining methods, especially filters, evaluate the merit of a feature subset based on the characteristics of the data and statistical measures, e.g., chi-square, rather than the classifiers intended for use (Huang et al., 2007). Discarding the classifier when performing feature selection can subsequently result in poor classification performance. This is because the relevant feature subset will not reflect the classifier’s specific characteristics. In this way, the resulting subset may not contain those features that are most relevant to the classifier and learning task. The wrapper is therefore superior to other feature selection methods like filters since it finds feature subsets that are more suited to the data mining problem.

These differences between wrappers and other existing feature selection techniques have been reviewed by a number of studies (e.g. Huan & Lei, 2005). However, such studies have primarily focussed on providing a holistic view of all the different types of techniques. Although these works provide comprehensive information, there has yet to appear a deep review that solely focuses on the most popular feature selection technique; the wrapper. To this end, this paper aims to present an in-depth survey of the wrapper. In particular, attention will be given to improvements made to the wrapper since they are known for being much slower than other existing feature selection techniques. This is primarily because wrappers are required to repeatedly run the classifier when determining feature accuracy and perform feature selection again each time a different classifier is used. To overcome such problems, researchers in this area have spent considerable effort in improving the performance of wrappers (Yu & Cho, 2006). Basically, improvements can be divided into two trends. One focuses on reducing the time taken to do feature selection and the other emphasises on improving the accuracy of the selected subset of features. In fact, there are close relationships between these two trends because one can potentially influence the other. In other words, decreasing the time taken to perform feature selection with wrappers may potentially affect the accuracy of the final output. This relationship has been investigated by several studies and will be reviewed in this paper.

The paper will first formally define the feature selection process, with an emphasis on the wrapper approach. Subsequently, improvements made to the wrapper for reducing the time taken to do feature selection and increasing the overall accuracy of the selected subset of features will be discussed. It then moves to discuss future directions for wrapper feature selection approaches. Finally, conclusions are drawn at the end of the paper.

Complete Chapter List

Search this Book:
Reset