Sequential Pattern Mining

Sequential Pattern Mining

Florent Masseglia (INRIA Sophia Antipolis, France), Maguelonne Teisseire (University of Montpellier II, France) and Pascal Poncelet (Ecole des Mines d’ Alès, France)
Copyright: © 2009 |Pages: 6
DOI: 10.4018/978-1-60566-010-3.ch274
OnDemand PDF Download:
$37.50

Abstract

Sequential pattern mining deals with data represented as sequences (a sequence contains sorted sets of items). Compared to the association rule problem, a study of such data provides “inter-transaction” analysis (Agrawal & Srikant, 1995). Applications for sequential pattern extraction are numerous and the problem definition has been slightly modified in different ways. Associated to elegant solutions, these problems can match with reallife timestamped data (when association rules fail) and provide useful results.
Chapter Preview
Top

Main Thrust

Definitions related to the sequential pattern extraction will first be given. They will help understanding the various problems and methods presented hereafter.

Definitions

The item is the basic value for numerous data mining problems. It can be considered as the object bought by a customer, or the page requested by the user of a website, etc. An itemset is the set of items that are grouped by timestamp (e.g. all the pages requested by the user on June 04, 2004). A data sequence is a sequence of itemsets associated to a customer. In table 1, the data sequence of C2 is the following: “(Camcorder, MiniDV) (DVD Rec, DVD-R) (Video Soft)” which means that the customer bought a camcorder and miniDV the same day, followed by a DVD recorder and DVD-R the day after, and finally a video software a few days later.

Table 1.
Data sequences of four customers over four days
CustJune 04, 2004June 05, 2004June 06, 2004June 07, 2004
C1Camcorder, MiniDVDigital CameraMemCardUSB Key
C2Camcorder, MiniDVDVD Rec, DVD-RVideo Soft
C3DVD Rec, DVD-RMemCardVideo SoftUSB Key
C4Camcorder, MiniDVLaptopDVD Rec, DVD-R

A sequential pattern is included in a data sequence (for instance “(MiniDV) (Video Soft)” is included in the data sequence of C2, whereas “(DVD Rec) (Camcorder)” is not included according to the order of the timestamps). The minimum support is specified by the user and stands for the minimum number of occurrences of a sequential pattern to be considered as frequent. A maximal frequent sequential pattern is included in at least “minimum support” data sequences and is not included in any other frequent sequential pattern. Table 1 gives a simple example of 4 customers and their activity over 4 days in a shop. With a minimum support of “50%” a sequential pattern can be considered as frequent if it occurs at least in the data sequences of 2 customers (2/4). In this case a maximal sequential pattern mining process will find three patterns:

  • S1: “(Camcorder, MiniDV) (DVD Rec, DVD-R)”

  • S2: “(DVD Rec, DVD-R) (Video Soft)”

  • S3: “(Memory Card) (USB Key)”

One can observe that S1 is included in the data sequences of C2 and C4, S2 is included in those of C2 and C3, and S3 in those of C1 and C2. Furthermore the sequences do not have the same length (S1 has length 4, S2 has length 3 and S3 has length 2).

Complete Chapter List

Search this Book:
Reset