Dimensionality (i.e., the number of data set attributes or groups of attributes) constitutes a serious obstacle to the efficiency of most data mining algorithms (Maimon and Last, 2000). The main reason for this is that data mining algorithms are computationally intensive. This obstacle is sometimes known as the “curse of dimensionality” (Bellman, 1961). The objective of Feature Selection is to identify features in the data-set as important, and discard any other feature as irrelevant and redundant information. Since Feature Selection reduces the dimensionality of the data, data mining algorithms can be operated faster and more effectively by using Feature Selection. In some cases, as a result of feature selection, the performance of the data mining method can be improved. The reason for that is mainly a more compact, easily interpreted representation of the target concept. The filter approach (Kohavi , 1995; Kohavi and John ,1996) operates independently of the data mining method employed subsequently -- undesirable features are filtered out of the data before learning begins. These algorithms use heuristics based on general characteristics of the data to evaluate the merit of feature subsets. A sub-category of filter methods that will be refer to as rankers, are methods that employ some criterion to score each feature and provide a ranking. From this ordering, several feature subsets can be chosen by manually setting There are three main approaches for feature selection: wrapper, filter and embedded. The wrapper approach (Kohavi, 1995; Kohavi and John,1996), uses an inducer as a black box along with a statistical re-sampling technique such as cross-validation to select the best feature subset according to some predictive measure. The embedded approach (see for instance Guyon and Elisseeff, 2003) is similar to the wrapper approach in the sense that the features are specifically selected for a certain inducer, but it selects the features in the process of learning.
Feature selection algorithms search through the space of feature subsets in order to find the best subset. This subset search has four major properties (Langley, 1994): starting point, search organization, evaluation strategy, and stopping criterion.
Starting Point: Selecting a point in the feature subset space from which to begin the search can affect the direction of the search.
Search Organization: A comprehensive search of the feature subspace is prohibitive for all but a small initial number of features.
Evaluation Strategy: How feature subsets are evaluated (filter, wrapper and ensemble).
Stopping Criterion: A feature selector must decide when to stop searching through the space of feature subsets.Top
Feature Selection Techniques
This section provides a survey of techniques for each strategy described on previous sections.
The filter methods were the earliest approaches for feature selection. All filter methods use general properties of the data in order to evaluate the merit of feature subsets. As a result, filter methods are generally much faster and practical than wrapper methods, especially for using it on data of high dimensionality. Detailed experiments for each method presented below can be found in Hall’s work (1999) and on Liu and Motoda (1998) book on Feature Selection.
Almuallim and Dietterich (1991) describe an algorithm originally designed for Boolean domains called FOCUS. FOCUS exhaustively searches the space of feature subsets until every combination of feature values is associated with one value of the class. After selecting the subset, it passed to ID3 (Quinlan, 1986), which constructs a decision tree.