Information Veins and Resampling with Rough Set Theory

Information Veins and Resampling with Rough Set Theory

Benjamin Griffiths (Cardiff University, UK)
Copyright: © 2009 |Pages: 7
DOI: 10.4018/978-1-60566-010-3.ch160
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Rough Set Theory (RST), since its introduction in Pawlak (1982), continues to develop as an effective tool in data mining. Within a set theoretical structure, its remit is closely concerned with the classification of objects to decision attribute values, based on their description by a number of condition attributes. With regards to RST, this classification is through the construction of ‘if .. then ..’ decision rules. The development of RST has been in many directions, amongst the earliest was with the allowance for miss-classification in the constructed decision rules, namely the Variable Precision Rough Sets model (VPRS) (Ziarko, 1993), the recent references for this include; Beynon (2001), Mi et al. (2004), and Slezak and Ziarko (2005). Further developments of RST have included; its operation within a fuzzy environment (Greco et al., 2006), and using a dominance relation based approach (Greco et al., 2004). The regular major international conferences of ‘International Conference on Rough Sets and Current Trends in Computing’ (RSCTC, 2004) and ‘International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing’ (RSFDGrC, 2005) continue to include RST research covering the varying directions of its development. This is true also for the associated book series entitled ‘Transactions on Rough Sets’ (Peters and Skowron, 2005), which further includes doctoral theses on this subject. What is true, is that RST is still evolving, with the eclectic attitude to its development meaning that the definitive concomitant RST data mining techniques are still to be realised. Grzymala-Busse and Ziarko (2000), in a defence of RST, discussed a number of points relevant to data mining, and also made comparisons between RST and other techniques. Within the area of data mining and the desire to identify relationships between condition attributes, the effectiveness of RST is particularly pertinent due to the inherent intent within RST type methodologies for data reduction and feature selection (Jensen and Shen, 2005). That is, subsets of condition attributes identified that perform the same role as all the condition attributes in a considered data set (termed ß-reducts in VPRS, see later). Chen (2001) addresses this, when discussing the original RST, they state it follows a reductionist approach and is lenient to inconsistent data (contradicting condition attributes - one aspect of underlying uncertainty). This encyclopaedia article describes and demonstrates the practical application of a RST type methodology in data mining, namely VPRS, using nascent software initially described in Griffiths and Beynon (2005). The use of VPRS, through its relative simplistic structure, outlines many of the rudiments of RST based methodologies. The software utilised is oriented towards ‘hands on’ data mining, with graphs presented that clearly elucidate ‘veins’ of possible information identified from ß-reducts, over different allowed levels of missclassification associated with the constructed decision rules (Beynon and Griffiths, 2004). Further findings are briefly reported when undertaking VPRS in a resampling environment, with leave-one-out and bootstrapping approaches adopted (Wisnowski et al., 2003). The importance of these results is in the identification of the more influential condition attributes, pertinent to accruing the most effective data mining results.
Chapter Preview
Top

Introduction

Rough Set Theory (RST), since its introduction in Pawlak (1982), continues to develop as an effective tool in data mining. Within a set theoretical structure, its remit is closely concerned with the classification of objects to decision attribute values, based on their description by a number of condition attributes. With regards to RST, this classification is through the construction of ‘if .. then ..’ decision rules. The development of RST has been in many directions, amongst the earliest was with the allowance for miss-classification in the constructed decision rules, namely the Variable Precision Rough Sets model (VPRS) (Ziarko, 1993), the recent references for this include; Beynon (2001), Mi et al. (2004), and Ślęzak and Ziarko (2005). Further developments of RST have included; its operation within a fuzzy environment (Greco et al., 2006), and using a dominance relation based approach (Greco et al., 2004).

The regular major international conferences of ‘International Conference on Rough Sets and Current Trends in Computing’ (RSCTC, 2004) and ‘International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing’ (RSFDGrC, 2005) continue to include RST research covering the varying directions of its development. This is true also for the associated book series entitled ‘Transactions on Rough Sets’ (Peters and Skowron, 2005), which further includes doctoral theses on this subject. What is true, is that RST is still evolving, with the eclectic attitude to its development meaning that the definitive concomitant RST data mining techniques are still to be realised. Grzymala-Busse and Ziarko (2000), in a defence of RST, discussed a number of points relevant to data mining, and also made comparisons between RST and other techniques.

Within the area of data mining and the desire to identify relationships between condition attributes, the effectiveness of RST is particularly pertinent due to the inherent intent within RST type methodologies for data reduction and feature selection (Jensen and Shen, 2005). That is, subsets of condition attributes identified that perform the same role as all the condition attributes in a considered data set (termed β-reducts in VPRS, see later). Chen (2001) addresses this, when discussing the original RST, they state it follows a reductionist approach and is lenient to inconsistent data (contradicting condition attributes - one aspect of underlying uncertainty). This encyclopaedia article describes and demonstrates the practical application of a RST type methodology in data mining, namely VPRS, using nascent software initially described in Griffiths and Beynon (2005). The use of VPRS, through its relative simplistic structure, outlines many of the rudiments of RST based methodologies.

The software utilised is oriented towards ‘hands on’ data mining, with graphs presented that clearly elucidate ‘veins’ of possible information identified from β-reducts, over different allowed levels of miss-classification associated with the constructed decision rules (Beynon and Griffiths, 2004). Further findings are briefly reported when undertaking VPRS in a resampling environment, with leave-one-out and bootstrapping approaches adopted (Wisnowski et al., 2003). The importance of these results is in the identification of the more influential condition attributes, pertinent to accruing the most effective data mining results.

Complete Chapter List

Search this Book:
Reset