Reasoning about Frequent Patterns with Negation

Reasoning about Frequent Patterns with Negation

Marzena Kryszkiewicz (Warsaw University of Technology, Poland)
Copyright: © 2009 |Pages: 8
DOI: 10.4018/978-1-60566-010-3.ch254
OnDemand PDF Download:
$37.50

Abstract

Discovering of frequent patterns in large databases is an important data mining problem. The problem was introduced in (Agrawal, Imielinski & Swami, 1993) for a sales transaction database. Frequent patterns were defined there as sets of items that are purchased together frequently. Frequent patterns are commonly used for building association rules. For example, an association rule may state that 80% of customers who buy fish also buy white wine. This rule is derivable from the fact that fish occurs in 5% of sales transactions and set {fish, white wine} occurs in 4% of transactions. Patterns and association rules can be generalized by admitting negation. A sample association rule with negation could state that 75% of customers who buy coke also buy chips and neither beer nor milk. The knowledge of this kind is important not only for sales managers, but also in medical areas (Tsumoto, 2002). Admitting negation in patterns usually results in an abundance of mined patterns, which makes analysis of the discovered knowledge infeasible. It is thus preferable to discover and store a possibly small fraction of patterns, from which one can derive all other significant patterns when required. In this chapter, we introduce first lossless representations of frequent patterns with negation.
Chapter Preview
Top

Background

Let us analyze sample transactional database D presented in Table 1, which we will use throughout the chapter. Each row in this database reports items that were purchased by a customer during a single visit to a supermarket.

Table 1.
Sample database D

As follows from Table 1, items a and b were purchased together in four transactions. The number of transactions in which set of items {x1, ..., xn} occurs is called its support and denoted by sup({x1, ..., xn}). A set of items is called a frequent pattern if its support exceeds a user-specified threshold (minSup). Otherwise, it is called an infrequent pattern. In the remainder of the chapter, we assume minSup = 1. One can discover 27 frequent patterns from D, which we list in Figure 1.

Figure 1.

Frequent positive patterns discovered from database D. Values provided in square brackets in the subscript denote supports of patterns.

One can easily note that the support of a pattern never exceeds the supports of its subsets. Hence, subsets of a frequent pattern are also frequent, and supersets of an infrequent pattern are infrequent.

Aside from searching for only statistically significant sets of items, one may be interested in identifying frequent cases when purchase of some items (presence of some symptoms) excludes purchase of other items (presence of other symptoms). A pattern consisting of items x1, …, xm and negations of items xm+1, …, xn will be denoted by {x1, …, xm, −xm+1, …, −xn}. The support of pattern {x1, …, xm, −xm+1, …, −xn} is defined as the number of transactions in which all items in set {x1, …, xm} occur and no item in set {xm+1, …, xn}) occurs. In particular, {a(–b)} is supported by two transactions in D, while {a(–b)(–c)} is supported by one transaction. Hence, {a(–b)} is frequent, while {a(–b)(–c)} is infrequent.

Complete Chapter List

Search this Book:
Reset