On Interacting Features in Subset Selection

On Interacting Features in Subset Selection

Zheng Zhao (Arizona State University, USA)
Copyright: © 2009 |Pages: 6
DOI: 10.4018/978-1-60566-010-3.ch167
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

The high dimensionality of data poses a challenge to learning tasks such as classification. In the presence of many irrelevant features, classification algorithms tend to overfit training data (Guyon & Elisseeff, 2003). Many features can be removed without performance deterioration, and feature selection is one effective means to remove irrelevant features (Liu & Yu, 2005). Feature selection, also known as variable selection, feature reduction, attribute selection or variable subset selection, is the technique of selecting a subset of relevant features for building robust learning models. Usually a feature is relevant due to two reasons: (1) it is strongly correlated with the target concept; or (2) it forms a feature subset with other features and the subset is strongly correlated with the target concept. Optimal feature selection requires an exponentially large search space (O(2n), where n is the number of features) (Almual-lim & Dietterich, 1994). Researchers often resort to various approximations to determine relevant features, and in many existing feature selection algorithms, feature relevance is determined by correlation between individual features and the class (Hall, 2000; Yu & Liu, 2003). However, a single feature can be considered irrelevant based on its correlation with the class; but when combined with other features, it can become very relevant. Unintentional removal of these features can result in the loss of useful information and thus may cause poor classification performance, which is studied as attribute interaction in (Jakulin & Bratko, 2003). Therefore, it is desirable to consider the effect of feature interaction in feature selection.
Chapter Preview
Top

Background

The goal of feature selection is to remove irrelevant features and retain relevant ones. We first give the definition of feature relevance as in (John et al., 1994).

  • Definition 1 (Feature Relevance):

Let F be the full set of features, Fi be a feature and Si = F – {Fi}. Let P(C|S) denote the conditional probability of class C given a feature sets. A feature Fi is relevant iff

(1)

Definition 1 suggests that a feature can be relevant, if its removal from a feature set reduces the prediction power of the feature set. A feature, whose removal does not reduce the prediction power of any feature set, is an irrelevant feature and can be removed from the whole feature set without any side-effect. From Definition 1, it can be shown that a feature can be relevant due to two reasons: (1) it is strongly correlated with the target concept; or (2) it forms a feature subset with other features and the subset is strongly correlated with the target concept. If a feature is relevant because of the second reason, there exists feature interaction. Feature interaction is characterized by its irreducibility (Jakulin & Bratko, 2004). We give the definition of kth-order below.

  • Definition 2 (kth order Feature Interaction):

Let F be a feature subset with k features F1, F2, ..., Fk. Let ℑ denote a metric that measures the relevance of a feature or a feature subset with the class label. Features F1, F2, ... Fk are said to interact with each other iff: for an arbitrary partition S = {S1, S2, S3, ..., Sl} of F, where 2 ≤ lk and Si ≠ ϕ, we have ∀i ∈ [1, l], ℑ(F) > ℑ(Si).

Complete Chapter List

Search this Book:
Reset