Rare Class Association Rule Mining with Multiple Imbalanced Attributes

Rare Class Association Rule Mining with Multiple Imbalanced Attributes

Huaifeng Zhang (University of Technology, Australia), Yanchang Zhao (University of Technology, Australia), Longbing Cao (University of Technology, Australia), Chengqi Zhang (University of Technology, Australia) and Hans Bohlscheid (Projects Section, Business Integrity Programs Branch)
DOI: 10.4018/978-1-60566-754-6.ch005
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this chapter, the authors propose a novel framework for rare class association rule mining. In each class association rule, the right-hand is a target class while the left-hand may contain one or more attributes. This algorithm is focused on the multiple imbalanced attributes on the left-hand. In the proposed framework, the rules with and without imbalanced attributes are processed in parallel. The rules without imbalanced attributes are mined through a standard algorithm while the rules with imbalanced attributes are mined based on newly defined measurements. Through simple transformation, these measurements can be in a uniform space so that only a few parameters need to be specified by user. In the case study, the proposed algorithm is applied in the social security field. Although some attributes are severely imbalanced, rules with a minority of imbalanced attributes have been mined efficiently.
Chapter Preview
Top

The data imbalance problem has attracted more and more research interest in data mining and machine learning. The algorithms to tackle data imbalance problems can be categorized as data level and algorithm level. At data level, the solution is to resample the dataset, including oversampling the instances of minority, under-sampling the instances of majority (Liu, 2006), or a combination of the two techniques (Chawla, 2002). At algorithm level, the solutions are to adapt existing classifier learning algorithms to bias towards the minority, such as cost sensitive learning (Liu, 2006a; Sun, 2006; Sun, 2007) and recognition-based learning (Japkowicz, 2001).

Recently, there are some researchers working on the data imbalance in class association rule mining. In 2003, Gu et al. (2003) proposed an algorithm to deal with imbalanced class distribution in association rule mining. They defined a set of criteria to measure the interestingness of the association rules. Arunasalam and Chawla (2006) presented an algorithm for association rule mining in imbalanced data. Their paper studied the anti-monotonic property of the Complement Class Support (CCS) and applied it into the association rule mining procedure. Verhein and Chawla (2007) proposed a novel measure, Class Correlation Ratio (CCR), as the principal measure in association mining to tackle the data imbalance problem in class association rule mining. Their algorithm outperforms the previous algorithms on imbalanced datasets. However, the above three algorithms are focused on the data imbalance of the target class to improve the performance of so-called associative classifier (Liu, 1998).

Complete Chapter List

Search this Book:
Reset