Maintenance of Frequent Patterns: A Survey

Maintenance of Frequent Patterns: A Survey

Mengling Feng, Jinyan Li, Guozhu Dong, Limsoon Wong
DOI: 10.4018/978-1-60566-404-0.ch014
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This chapter surveys the maintenance of frequent patterns in transaction datasets. It is written to be accessible to researchers familiar with the field of frequent pattern mining. The frequent pattern maintenance problem is summarized with a study on how the space of frequent patterns evolves in response to data updates. This chapter focuses on incremental and decremental maintenance. Four major types of maintenance algorithms are studied: Apriori-based, partition-based, prefix-tree-based, and conciserepresentation- based algorithms. The authors study the advantages and limitations of these algorithms from both the theoretical and experimental perspectives. Possible solutions to certain limitations are also proposed. In addition, some potential research opportunities and emerging trends in frequent pattern maintenance are also discussed.
Chapter Preview
Top

Preliminaries And Problem Description

Discovery of Frequent Patterns

Let I = {i1, i2, ..., im} be a set of distinct literals called ‘items’. A ‘pattern’, or an ‘itemset’, is a set of items. A ‘transaction’ is a non-empty set of items. A ‘dataset’ is a non-empty set of transactions. A pattern P is said to be contained or included in a transaction T if PT. A pattern P is said to be contained in a dataset D, denoted as PD, if there is TD such that PT. The ‘support count’ of a pattern P in a dataset D, denoted count(P,D), is the number of transactions in D that contain P. The ‘support’ of a pattern P in a dataset D, denoted sup(P,D), is calculated as sup(P,D) = count(P,D)/|D|. Figure 1(a) shows a sample dataset, and all the patterns contained in the sample dataset are enumerated in Figure 1(b) with their support counts.

Figure 1.

(a) An example of transaction dataset. (b) The space of frequent patterns for the sample dataset in (a) when ms%=25% and the concise representations of the space. (c) Decomposition of frequent pattern space into equivalence classes.

978-1-60566-404-0.ch014.f01

A pattern P is said to be frequent in a dataset D if sup(P,D) is greater than or equal to a pre-specified threshold ms%. Given a dataset D and a support threshold ms%, the collection of all frequent itemsets in D is called the ‘space of frequent patterns’, and is denoted by F(ms%, D). The task of frequent pattern mining is to discover all the patterns in the space of frequent patterns. In real-life applications, the size of the frequent pattern space is often tremendous. According to the definition, suppose the dataset has l distinct items, the size of the frequent pattern space can go up to 2l. To increase computational efficiency and reduce memory usage, concise representations are developed to summarize the frequent pattern space.

Complete Chapter List

Search this Book:
Reset