Updating the Built Prelarge Fast Updated Sequential Pattern Trees with Sequence Modification

Updating the Built Prelarge Fast Updated Sequential Pattern Trees with Sequence Modification

Jerry Chun-Wei Lin (Innovative Information Industry Research Center (IIIRC) & School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, China), Wensheng Gan (Innovative Information Industry Research Center (IIIRC) & School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, China), Tzung-Pei Hong (Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan & Department of Computer Science and Engineering, National SunYat-sen University, Kaohsiung, Taiwan) and Jingliang Zhang (Shandong Sport University, Jinan, China)
Copyright: © 2015 |Pages: 22
DOI: 10.4018/ijdwm.2015010101
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Mining useful information or knowledge from a very large database to aid managers or decision makers to make appropriate decisions is a critical issue in recent years. Sequential patterns can be used to discover the purchased behaviors of customers or the usage behaviors of users from Web log data. Most approaches process a static database to discover sequential patterns in a batch way. In real-world applications, transactions or sequences in databases are frequently changed. In the past, a fast updated sequential pattern (FUSP)-tree was proposed to handle dynamic databases whether for sequence insertion, deletion or modification based on FUP concepts. Original database is required to be re-scanned if it is necessary to maintain the small sequences which was not kept in the FUSP tree. In this paper, the prelarge concept was adopted to maintain and update the built prelarge FUSP tree for sequence modification. A prelarge FUSP tree is modified from FUSP tree for preserving not only the frequent 1-sequences but also the prelarge 1-sequences in the tree structure. The PRELARGE-FUSP-TREE-MOD maintenance algorithm is proposed to reduce the rescans of the original database due to the pruning properties of prelarge concept. When the number of modified sequences is smaller than the safety bound of the prelarge concept, better results can be obtained by the proposed PRELARGE-FUSP-TREE-MOD maintenance algorithm for sequence modification in dynamic databases.
Article Preview

1. Introduction

Since the explosion of very large database in recent decade, it is a critical issue to efficiently mine the desired knowledge or information to aid managers in decision-making. Depending on specific applications from database or Web data (Han & Chang, 2002; Mitra, Pal, & Mitra, 2002), various knowledge are then arisen which can be mostly classified as association-rule mining (R. Agrawal, Imielinski, & Swami, 1993; Rakesh Agrawal & Srikant, 1994; Ashrafi, Taniar, & Smith, 2004; Chen, Han, & Yu, 1996; Daly & Taniar, 2004; Taniar, Rahayu, Lee, & Daly, 2008), classification (Kotsiantis, 2007), clustering (Berkhin, 2006; Kwok, Smith, Lozano, & Taniar, 2002), and sequential pattern mining (Rakesh Agrawal & Srikant, 1995; Srikant & Agrawal, 1996; Taniar, Leung, Rahayu, & Goel, 2008), among others. Analyzing the ordered sequence data from Web-click logs, DNA sequences or network flow logs are the important and critical applications in sequential mining.

In market basket analysis, sequential patterns can be used to find purchased sequences of customers for predicting whether there is a high probability that when customers buy some products, they will buy some other products in later transactions. For example, a customer buys a CD player in one transaction at electronics store, he or she will buy several CDs in a later transaction. Note that the transaction sequences are unnecessary to be consecutive. Agrawal et al. proposed AprioriAll algorithm (Rakesh Agrawal & Srikant, 1995) for mining sequential patters in a level-wise way. It is extended from Apriori algorithm (Rakesh Agrawal & Srikant, 1994) to generate-and-test the potential candidates for deriving the sequential patterns (large sequences). Various extensions for mining sequential patterns have been proposed (Rakesh Agrawal & Srikant, 1995; Pei et al., 2004; Srikant & Agrawal, 1996) to assist managers in making decisions from a static database. The discovered sequential patterns may, however, become invalid since sequences are inserted (C. W. Lin, Hong, Lu, & Lin, 2008; M. Y. Lin & Lee, 1998), deleted (C. W. Lin, Hong, & Lu, 2009) or modified (C. W. Lin, Hong, Lu, & Chen, 2008) for real-world applications in dynamic databases. An intuitive way (Rakesh Agrawal & Srikant, 1995; Srikant & Agrawal, 1996) for handling dynamic databases is to re-scan and re-mine the entire updated database for the valid sequential patterns after the database is updated. When database is very large and massive, however, considerable computations are required to re-scan the whole database without using the previous discovered information. Developing an efficient approach to maintain and update the discovered sequential patterns is thus a critical issue in real-world applications. Few algorithms are, however, designed to handle the sequential patterns in dynamic databases compared to those on maintaining association rules. Lin et al. proposed the FASTUP algorithm (M. Y. Lin & Lee, 1998) to maintain sequential patterns by extending the Fast UPdate (FUP) concept (Cheung, Han, Ng, & Wong, 1996) of association-rule mining. Hong et al. extended the prelarge concept of association-rule mining to maintain the sequential patterns in dynamic databases (Hong, Wang, & Tseng, 2011). Lin et al. designed a fast updated sequential pattern (FUSP)-tree and developed the algorithms for efficiently handling sequential patterns mining (C. W. Lin et al., 2009; C. W. Lin, Hong, Lu, & Chen, 2008; C. W. Lin, Hong, Lu, & Lin, 2008) in dynamic databases. The FP-growth-like algorithm (Han, Pei, Yin, & Mao, 2004) can thus be used to mine the desired sequential patterns from the built FUSP tree. A database of the FUSP tree is, however, required to be re-scanned if the small sequences are necessary to be maintained and updated.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing