Mining Conditional Contrast Patterns

Mining Conditional Contrast Patterns

Guozhu Dong (Wright State University, USA), Jinyan Li (Nanyang Technological University, Singapore), Guimei Liu (National University of Singapore, Singapore) and Limsoon Wong (National University of Singapore, Singapore)
DOI: 10.4018/978-1-60566-404-0.ch015
OnDemand PDF Download:


This chapter considers the problem of “conditional contrast pattern mining.” It is related to contrast mining, where one considers the mining of patterns/models that contrast two or more datasets, classes, conditions, time periods, and so forth. Roughly speaking, conditional contrasts capture situations where a small change in patterns is associated with a big change in the matching data of the patterns. More precisely, a conditional contrast is a triple (B, F1, F2) of three patterns; B is the condition/context pattern of the conditional contrast, and F1 and F2 are the contrasting factors of the conditional contrast. Such a conditional contrast is of interest if the difference between F1 and F2 as itemsets is relatively small, and the difference between the corresponding matching dataset of B?F1 and that of B?F2 is relatively large. It offers insights on “discriminating” patterns for a given condition B. Conditional contrast mining is related to frequent pattern mining and analysis in general, and to the mining and analysis of closed pattern and minimal generators in particular. It can also be viewed as a new direction for the analysis (and mining) of frequent patterns. After formalizing the concepts of conditional contrast, the chapter will provide some theoretical results on conditional contrast mining. These results (i) relate conditional contrasts with closed patterns and their minimal generators, (ii) provide a concise representation for conditional contrasts, and (iii) establish a so-called dominance-beam property. An efficient algorithm will be proposed based on these results, and experiment results will be reported. Related works will also be discussed.
Chapter Preview


This chapter formalizes the notions of conditional contrast patterns (C2Ps) and conditional contrast factors (C2Fs), and studies the associated data mining problem. These concepts are formulated in the abstract space of patterns and their matching datasets.

Roughly speaking, C2Ps are aimed at capturing situations or contexts (the conditional contrast bases or C2Bs) where small changes in patterns to the base make big differences in matching datasets. The small changes are the C2Fs and their cost is measured by the average number of items in the C2Fs. The big differences are the differences among the matching datasets of the C2Fs; we use the average size of the differences to measure the impact (of the C2Fs). Combining cost and impact allows us to find those C2Fs which are very effective difference makers. In formula, a C2P is a pair 〈B, {F1, ..., Fk}〉, where k >1, and B and Fi are itemsets; B is the C2B and the Fi’s are the C2Fs.

For k=2, Figure 1 (a) shows that F1 and F2 are small itemset changes to B. Panel (b) shows that the matching datasets of BF1 and BF2 are significantly different from each other. The k>2 case is similar.1

Figure 1.

Conditional contrast patterns/factors: (a) F1 and F2 are small itemset changes to B, and (b) the matching dataset of BF1 is very different from that of BF2.

We use the impact-to-cost ratio, defined as the impact divided by the cost, as well as other measures, to evaluate the goodness of C2Ps and C2Fs. Observe that one can also consider other factors involving class, financial benefit or utility in defining this ratio.

  • Example 1.1C2Ps can give new insights to many, especially medical/business, applications. We illustrate the concepts using a medical dataset. From a microarray gene expression dataset used in acute lymphoblastic leukemia subtype study [Yeoh et al, 2002], we got a number of C2Ps, including the following2:

PL=〈{gene-38319-at≥15975.6}, {{gene-33355-at < 10966}, {gene-33355-at ≥ 10966}}〉

Here {gene-38319-at ≥15975.6} is the C2B, {gene-33355-at < 10966} is F1, and {gene-33355-at ≥ 10966} is F2. This C2P says that the samples that satisfy gene-38319-at ≥ 15975.6 (which are the samples of B-lineage type) are split into two disjoint parts: the first part are the E2A-PBX1 subtype (18 samples), and the other part are the other B-lineage subtypes (169 samples). Expressed as a rule, PL says: Among the samples satisfying gene-38319-at ≥ 15975.6, if the expression of gene-33355-at is less than 10966, then the sample is E2A-PBX1; otherwise, it belongs to the other types of B-lineage.

This C2P nicely illustrates how the regulation of gene-38319-at and gene-33355-at splits patients into different acute lymphoblastic leukemia subtypes.

Typically, an individual C2F of a C2P does not make the big differences between matching datasets; the differences are made by two or more C2Fs of the C2P. For example, in a C2P with two C2Fs F1 and F2, the set of items in F1F2 makes the differences.

Complete Chapter List

Search this Book:
Editorial Advisory Board
Table of Contents
David Bell
Yanchang Zhao, Chengqi Zhang, Longbing Cao
Chapter 1
Paul D. McNicholas, Yanchang Zhao
Association rules present one of the most versatile techniques for the analysis of binary data, with applications in areas as diverse as retail... Sample PDF
Association Rules: An Overview
Chapter 2
Mirko Boettcher, Georg Ruß, Detlef Nauck, Rudolf Kruse
Association rule mining typically produces large numbers of rules, thereby creating a second-order data mining problem: which of the generated rules... Sample PDF
From Change Mining to Relevance Feedback: A Unified View on Assessing Rule Interestingness
Chapter 3
Solange Oliveira Rezende, Edson Augusto Melanda, Magaly Lika Fujimoto, Roberta Akemi Sinoara, Veronica Oliveira de Carvalho
Association rule mining is a data mining task that is applied in several real problems. However, due to the huge number of association rules that... Sample PDF
Combining Data-Driven and User-Driven Evaluation Measures to Identify Interesting Rules
Chapter 4
Julien Blanchard, Fabrice Guillet, Pascale Kuntz
Assessing rules with interestingness measures is the cornerstone of successful applications of association rule discovery. However, as numerous... Sample PDF
Semantics-Based Classification of Rule Interestingness Measures
Chapter 5
Huawen Liu, Jigui Sun, Huijie Zhang
In data mining, rule management is getting more and more important. Usually, a large number of rules will be induced from large databases in many... Sample PDF
Post-Processing for Rule Reduction Using Closed Set
Chapter 6
Hacène Cherfi, Amedeo Napoli, Yannick Toussaint
A text mining process using association rules generates a very large number of rules. According to experts of the domain, most of these rules... Sample PDF
A Conformity Measure Using Background Knowledge for Association Rules: Application to Text Mining
Chapter 7
Hetal Thakkar, Barzan Mozafari, Carlo Zaniolo
The real-time (or just-on-time) requirement associated with online association rule mining implies the need to expedite the analysis and validation... Sample PDF
Continuous Post-Mining of Association Rules in a Data Stream Management System
Chapter 8
Ronaldo Cristiano Prati
Receiver Operating Characteristics (ROC) graph is a popular way of assessing the performance of classification rules. However, as such graphs are... Sample PDF
QROC: A Variation of ROC Space to Analyze Item Set Costs/Benefits in Association Rules
Chapter 9
Maria-Luiza Antonie, David Chodos, Osmar Zaïane
The chapter introduces the associative classifier, a classification model based on association rules, and describes the three phases of the model... Sample PDF
Variations on Associative Classifiers and Classification Results Analyses
Chapter 10
Silvia Chiusano, Paolo Garza
In this chapter the authors make a comparative study of five well-known classification rule pruning methods with the aim of understanding their... Sample PDF
Selection of High Quality Rules in Associative Classification
Chapter 11
Sadok Ben Yahia, Olivier Couturier, Tarek Hamrouni, Engelbert Mephu Nguifo
Providing efficient and easy-to-use graphical tools to users is a promising challenge of data mining, especially in the case of association rules.... Sample PDF
Meta-Knowledge Based Approach for an Interactive Visualization of Large Amounts of Association Rules
Chapter 12
Claudio Haruo Yamamoto, Maria Cristina Ferreira de Oliveira, Solange Oliveira Rezende
Miners face many challenges when dealing with association rule mining tasks, such as defining proper parameters for the algorithm, handling sets of... Sample PDF
Visualization to Assist the Generation and Exploration of Association Rules
Chapter 13
Nicolas Pasquier
After more than one decade of researches on association rule mining, efficient and scalable techniques for the discovery of relevant association... Sample PDF
Frequent Closed Itemsets Based Condensed Representations for Association Rules
Chapter 14
Mengling Feng, Jinyan Li, Guozhu Dong, Limsoon Wong
This chapter surveys the maintenance of frequent patterns in transaction datasets. It is written to be accessible to researchers familiar with the... Sample PDF
Maintenance of Frequent Patterns: A Survey
Chapter 15
Guozhu Dong, Jinyan Li, Guimei Liu, Limsoon Wong
This chapter considers the problem of “conditional contrast pattern mining.” It is related to contrast mining, where one considers the mining of... Sample PDF
Mining Conditional Contrast Patterns
Chapter 16
Qinrong Feng, Duoqian Miao, Ruizhi Wang
Decision rules mining is an important technique in machine learning and data mining, it has been studied intensively during the past few years.... Sample PDF
Multidimensional Model-Based Decision Rules Mining
About the Contributors