Feature Reduction with Inconsistency

Feature Reduction with Inconsistency

Yong Liu (Institute of Cyber-Systems and Control of Zhejiang University, China), Yunliang Jiang (Huzhou Teachers College, China) and Jianhua Yang (SCI-Tech Academy of Zhejiang University, China)
DOI: 10.4018/978-1-4666-1743-8.ch014


Feature selection is a classical problem in machine learning, and how to design a method to select the features that can contain all the internal semantic correlation of the original feature set is a challenge. The authors present a general approach to select features via rough set based reduction, which can keep the selected features with the same semantic correlation as the original feature set. A new concept named inconsistency is proposed, which can be used to calculate the positive region easily and quickly with only linear temporal complexity. Some properties of inconsistency are also given, such as the monotonicity of inconsistency and so forth. The authors also propose three inconsistency based attribute reduction generation algorithms with different search policies. Finally, a “mini-saturation” bias is presented to choose the proper reduction for further predictive designing.
Chapter Preview

Some related definitions and concepts are presented as follow:

Definition 1 Positive region, P and Q are two sets in the information system U(C, D),978-1-4666-1743-8.ch014.m01, then the positive region of Q in P, denoted as978-1-4666-1743-8.ch014.m02, can be calculated as:

Definition 2 Attribute dependency, P and Q are two sets in the information system U(C, D), 978-1-4666-1743-8.ch014.m04, then the attribute dependency of attribute set Q on attribute set P, denoted as 978-1-4666-1743-8.ch014.m05,can be calculated as:

The attribute dependency can describe which variables are strongly related to which other variables, for example, if978-1-4666-1743-8.ch014.m07, then 978-1-4666-1743-8.ch014.m08 can be viewed as the measure between the decision attributes and the condition attributes, which can be implemented in further predictive modeling.

With the definition of attribute dependency, the attribute reduct can be defined as follow:

Definition 3 Attribute reduct, In information system U(C, D), 978-1-4666-1743-8.ch014.m09, R is the reduct of C if and only if978-1-4666-1743-8.ch014.m10 and 978-1-4666-1743-8.ch014.m11or equivalently 978-1-4666-1743-8.ch014.m12 and 978-1-4666-1743-8.ch014.m13

The essence of attribute reduct is to find a subset P from condition set, and the subset P can maintain the same discriminability under the instance space. So we can judge whether the set is a reduct by its discriminability under the instance space. So the positive region, which calculates the number of instances that can be discriminable with the attribute set, can be used to find the reduct.

From the definition of attribute reduct, we can see the reduct could keep the internal correlation of the attributes. Here we introduce the reduction into the feature selection, as the reduct can maintain the same discriminability as the original data set (Jensen & Shen, 2004).

Definition 4 Inconsistent condition, in information system U(C, D), C is the condition attribute set, D is the decision attribute set,978-1-4666-1743-8.ch014.m14, if 978-1-4666-1743-8.ch014.m15 and 978-1-4666-1743-8.ch014.m16, then there are inconsistent condition between instance 978-1-4666-1743-8.ch014.m17 and instance 978-1-4666-1743-8.ch014.m18.

Definition 5 Inconsistent instance number, in information system U(C, D), C is the condition attribute set, D is the decision attribute set, if 978-1-4666-1743-8.ch014.m19, 978-1-4666-1743-8.ch014.m20, 978-1-4666-1743-8.ch014.m21, the inconsistent instance number of set P is denoted as 978-1-4666-1743-8.ch014.m22, and calculated as follow:

Complete Chapter List

Search this Book: