1.1 Associative Classification
Associative Classification (AC) (Vishwakarma et al. 2013; Phan-Luong, 2013) is an integration of association and classification methods aims to build a classifier describing the classes of the input training data set. Class Association Rules (CARs) (Mabu et al. 2011) are basically used to build a classification model for which prediction defines the relationship between the itemsets and the class labels. It is mainly used in the medical dataset. Based on the generated rules, it is possible to identify population at high risk for a particular disease.
Constructing an associative classifier involves two steps. Given the training dataset, class association rules are generated first. Then, a small subset of high- quality rules are obtained using pruning methods and an accurate classifier or model is built for the training data. However, in large or correlated data sets, rule mining may yield a huge number of classification rules. Hence, pruning techniques, in particular, support- based pruning, are exploited to reduce the complexity of the extraction task.
1.2 Motivation
A class association rule is a special case of association rule where the rule consequent contains only a class label (Nguyen et al., 2016). The basic objective of any user in mining class association rules (CARs) is to determine a complete set of CARs in such a way that it satisfies user-specified minimum support and minimum confidence thresholds. In literature there exists several methods and techniques have been proposed to manage this issue in CARs. In fact, the CAR mining has been widely used in various practical domains, namely tourism management ((Rong et al., 2012; Leung et al., 2013), social media security (Zhang et al., 2007), healthcare (Nahar et al., 2013), and education sectors (Luna et al., 2015). However, it is also observed by previous researchers that the results obtained as a result of CARs mining has redundancy.
Many real time applications are dynamic in nature such that the data is incrementally added to the database. The major problem with incremental mining is updating knowledge which was mined from the original database. Methods for mining frequent itemsets from incremental datasets have been proposed by (Cheung et al. 1996; Hong et al. 2001; Le et al. 2011, 2012 ; Vo et al. 2014). These methods reduce the execution time and memory usage compared to those obtained from rescanning the original dataset. Thus the techniques of incremental frequent itemset mining can be integrated into constraint CAR Mining. Hence, the incremental method generates all CARs satisfying the given constraint whenever the original dataset is added with new records. However, there exists the possibility of generation of duplicate and redundant CCARs. It can be addressed by applying post processing techniques such as pruning and selection of rules.
In the post-processing step, the Constraint Class Association Rules (CCARs) can be pruned, ranked and then quality rules are selected in order to build a compact and high quality classifier. The classifier is evaluated by measuring the number of correct classifications made for a given set of test instances. Finally, the prediction of test instances is made by selecting the single rule or multiple rules satisfying the antecedent of test instances.