Time-Constrained Sequential Pattern Mining

Time-Constrained Sequential Pattern Mining

Ming-Yen Lin (Feng Chia University, Taiwan)
Copyright: © 2009 |Pages: 5
DOI: 10.4018/978-1-60566-010-3.ch301
OnDemand PDF Download:
$37.50

Abstract

Sequential pattern mining is one of the important issues in the research of data mining (Agrawal & Srikant, 1995; Ayres, Gehrke, & Yiu, 2002; Han, Pei, & Yan, 2004; Lin & Lee, 2004; Lin & Lee, 2005b; Roddick & Spiliopoulou, 2002). A typical example is a retail database where each record corresponds to a customer’s purchasing sequence, called data sequence. A data sequence is composed of all the customer’s transactions ordered by transaction time. Each transaction is represented by a set of literals indicating the set of items (called itemset) purchased in the transaction. The objective is to find all the frequent sub-sequences (called sequential patterns) in the sequence database. Whether a sub-sequence is frequent or not is determined by its frequency, named support, in the sequence database. An example sequential pattern might be that 40% customers bought PC and printer, followed by the purchase of scanner and graphics-software, and then digital camera. Such a pattern, denoted by , has three elements where each element is an itemset. Although the issue is motivated by the retail industry, the mining technique is applicable to domains bearing sequence characteristics, including the analysis of Web traversal patterns, medical treatments, natural disasters, DNA sequences, and so forth. In order to have more accurate results, constraints in addition to the support threshold need to be specified in the mining (Pei, Han, & Wang, 2007; Chen & Yu, 2006; Garofalakis, Rastogi, & Shim, 2002; Lin & Lee, 2005a; Masseglia, Poncelet, & Teisseire, 2004). Most time-independent constraints can be handled, without modifying the fundamental mining algorithm, by a postprocessing on the result of sequential pattern mining without constraints. Time-constraints, however, cannot be managed by retrieving patterns because the support computation of patterns must validate the time attributes for every data sequence in the mining process. Therefore, time-constrained sequential pattern mining (Lin & Lee, 2005a; Lin, Hsueh, & Chang, 2006; Masseglia, Poncelet, & Teisseire, 2004;) is more challenging, and more important in the aspect of temporal relationship discovery, than conventional pattern mining.
Chapter Preview
Top

Background

The issue of mining sequential patterns with time constraints was first addressed by Srikant and Agrawal in 1996 (Srikant & Agrawal 1996). Three time constraints including minimum gap, maximum gap and sliding time-window are specified to enhance conventional sequence discovery. For example, without time constraints, one may find a pattern <(b, d, f)(a, e)>. However, the pattern could be insignificant if the time interval between (b, d, f) and (a, e) is too long. Such patterns could be filtered out if the maximum gap constraint is specified.

Analogously, one might discover the pattern <(b, c, e)(d, g)> from many data sequences consisting of itemset (d, g) occurring one day after the occurrence of itemset (b, c, e). Nonetheless, such a pattern is a false pattern in discovering weekly patterns, i.e. the minimum gap of 7 days. In other words, the sale of (b, c, e) might not trigger the sale of (d, g) in next week. Therefore, time constraints including maximum gap and minimum gap should be incorporated in the mining to reinforce the accuracy and significance of mining results.

Complete Chapter List

Search this Book:
Reset