Article Preview
Top1. Introduction
Since the explosion of very large database in recent decade, it is a critical issue to efficiently mine the desired knowledge or information to aid managers in decision-making. Depending on specific applications from database or Web data (Han & Chang, 2002; Mitra, Pal, & Mitra, 2002), various knowledge are then arisen which can be mostly classified as association-rule mining (R. Agrawal, Imielinski, & Swami, 1993; Rakesh Agrawal & Srikant, 1994; Ashrafi, Taniar, & Smith, 2004; Chen, Han, & Yu, 1996; Daly & Taniar, 2004; Taniar, Rahayu, Lee, & Daly, 2008), classification (Kotsiantis, 2007), clustering (Berkhin, 2006; Kwok, Smith, Lozano, & Taniar, 2002), and sequential pattern mining (Rakesh Agrawal & Srikant, 1995; Srikant & Agrawal, 1996; Taniar, Leung, Rahayu, & Goel, 2008), among others. Analyzing the ordered sequence data from Web-click logs, DNA sequences or network flow logs are the important and critical applications in sequential mining.
In market basket analysis, sequential patterns can be used to find purchased sequences of customers for predicting whether there is a high probability that when customers buy some products, they will buy some other products in later transactions. For example, a customer buys a CD player in one transaction at electronics store, he or she will buy several CDs in a later transaction. Note that the transaction sequences are unnecessary to be consecutive. Agrawal et al. proposed AprioriAll algorithm (Rakesh Agrawal & Srikant, 1995) for mining sequential patters in a level-wise way. It is extended from Apriori algorithm (Rakesh Agrawal & Srikant, 1994) to generate-and-test the potential candidates for deriving the sequential patterns (large sequences). Various extensions for mining sequential patterns have been proposed (Rakesh Agrawal & Srikant, 1995; Pei et al., 2004; Srikant & Agrawal, 1996) to assist managers in making decisions from a static database. The discovered sequential patterns may, however, become invalid since sequences are inserted (C. W. Lin, Hong, Lu, & Lin, 2008; M. Y. Lin & Lee, 1998), deleted (C. W. Lin, Hong, & Lu, 2009) or modified (C. W. Lin, Hong, Lu, & Chen, 2008) for real-world applications in dynamic databases. An intuitive way (Rakesh Agrawal & Srikant, 1995; Srikant & Agrawal, 1996) for handling dynamic databases is to re-scan and re-mine the entire updated database for the valid sequential patterns after the database is updated. When database is very large and massive, however, considerable computations are required to re-scan the whole database without using the previous discovered information. Developing an efficient approach to maintain and update the discovered sequential patterns is thus a critical issue in real-world applications. Few algorithms are, however, designed to handle the sequential patterns in dynamic databases compared to those on maintaining association rules. Lin et al. proposed the FASTUP algorithm (M. Y. Lin & Lee, 1998) to maintain sequential patterns by extending the Fast UPdate (FUP) concept (Cheung, Han, Ng, & Wong, 1996) of association-rule mining. Hong et al. extended the prelarge concept of association-rule mining to maintain the sequential patterns in dynamic databases (Hong, Wang, & Tseng, 2011). Lin et al. designed a fast updated sequential pattern (FUSP)-tree and developed the algorithms for efficiently handling sequential patterns mining (C. W. Lin et al., 2009; C. W. Lin, Hong, Lu, & Chen, 2008; C. W. Lin, Hong, Lu, & Lin, 2008) in dynamic databases. The FP-growth-like algorithm (Han, Pei, Yin, & Mao, 2004) can thus be used to mine the desired sequential patterns from the built FUSP tree. A database of the FUSP tree is, however, required to be re-scanned if the small sequences are necessary to be maintained and updated.