Article Preview
Top1. Introduction
Association rule mining discovers interesting relations among sets of items in databases. It is composed of two steps: finding all frequent itemsets and generating association rules from the itemsets discovered. The number of occurrences of an itemset is called its support count. An itemset becomes frequent when its support count exceeds a predefined threshold. Finding frequent itemsets in a given dataset is non-trivial because datasets can be very large and may contain many items. On the other hand, the second step of the association rule mining is straightforward. Therefore, the general performance of any association rule mining algorithm is determined by the first step (Han & Kamber, 2005).
Apriori and FP-Growth are known to be the two important association rule mining algorithms each having a different approach to find frequent itemsets (Agrawal & Srikant, 1994; Han, Pei, & Yin, 2000). The Apriori Algorithm uses Apriori Property in order to improve the efficiency of the level-wise generation of frequent itemsets. On the other hand, candidate itemsets generation and multiple database scans are the drawbacks of the algorithm. FP-Growth creates signatures of transactions on a tree structure to eliminate the database scans and outperforms Apriori (Han et al., 2000). A recent algorithm called Matrix Apriori, which combines the advantages of Apriori and FP-Growth, was proposed by Pavón, Paulo and Viana (2006). The algorithm eliminates multiple database scans by creating signatures of itemsets in the form of a matrix. Yildiz and Ergenc (2010) showed that Matrix Apriori provides a better overall performance than FP-Growth for the specified datasets and decreasing minimum support values.
Although all these algorithms handle the problem of association rule mining, they ignore the dynamic nature of the databases. When new transactions arrive, the entire process needs to be done from the beginning. The solution to this problem is dynamic itemset mining which proposes the idea of keeping frequent itemsets up-to-date when the database is updated. Dynamic itemset mining has four challenges: i) handling database updates without re-running the frequent itemset mining algorithms, ii) allowing new item appearances in updates, iii) being flexible to support changes during entire process and iv) handling deletions as well as additions in updates.
Dynamic itemset mining algorithms can be categorized in four groups. The first group is Apriori based algorithms (Cheung, Han, Vincent, & Wong, 1996; Cheung, Lee, & Kao, 1997). The main goal of these algorithms is to reduce the number of candidate sets and the need of scanning the original database when new transactions arrive. The second group can be dedicated to FP-Growth based algorithms (Cheung & Zaïane, 2003; Hong, Lin, & Wu, 2008; Adnan, Alhajj, & Barker, 2008; Li & Li, 2010; Pradeepini & Jyothi, 2010). These algorithms try to keep every itemset of the original database in a tree and modify the tree with each update. The third group is Border based algorithms (Aumann, Feldman, Lipshtat, & Manilla, 1999; Taha, Gharib, & Nassar, 2011) where the idea is to keep track of potential itemsets which can be frequent at anytime. The last group of the dynamic itemset mining algorithms use different data structures as tries (Woon, Ng, & Das, 2001) or matrices (Oguz & Ergenc, 2012) to keep the signatures of the transactions in the original database and modify them when the new updates arrive.
In this paper, we focus on improving our previous work (Oguz & Ergenc, 2012) in which the Incremental Matrix Apriori (IMA) Algorithm was presented. IMA is capable of providing solutions to the first three challenges of dynamic itemset mining algorithms; keeping frequent itemsets up-to-date without scanning the original dataset, being flexible to support changes and appearance of new items in the updates. However, it does not cover the last challenge which is handling deletions. Therefore, the enhancement of this paper can be summarized as;