Post-Processing for Rule Reduction Using Closed Set

Post-Processing for Rule Reduction Using Closed Set

Huawen Liu (Jilin University, P.R. China), Jigui Sun (Jilin University, P.R. China) and Huijie Zhang (Northeast Normal University, P.R. China)
DOI: 10.4018/978-1-60566-404-0.ch005
OnDemand PDF Download:
$37.50

Abstract

In data mining, rule management is getting more and more important. Usually, a large number of rules will be induced from large databases in many fields, especially when they are dense. This, however, directly leads to the gained knowledge hard to be understood and interpreted. To eliminate redundant rules from rule base, many efforts have been made and various efficient and outstanding algorithms have been proposed. However, end-users are often unable to complete a mining task because there are still insignificant rules. Thus, it becomes apparent that an efficient technique is needed to discard useless rules as more as possible, without information lossless. To achieve this goal, in this paper we propose an efficient method to filter superfluous rules from knowledge base in a post-processing manner. The main character of our method lies in that it eliminates redundancy of rules by dependent relation, which can be discovered by closed set mining technique. Their performance evaluations show that the compression degree achieved by our proposed method is better and its efficiency is also higher than those of other techniques.
Chapter Preview
Top

Introduction

Knowledge discovery in databases (KDD) refers to the overall process of mapping low-level data in large databases into high-level forms that might be more compact, more abstract, or more useful (Fayyad et al., 1996). KDD can be viewed as a multidisciplinary activity because it exploits several research disciplines of artificial intelligence such as machine learning, pattern recognition, expert systems, knowledge acquisition, as well as mathematical disciplines (Bruha, 2000). Its objective is to extract previously unknown and potentially useful information hidden behind data from usually very large databases. As one of core components of KDD, data mining refers to the process of extracting interesting patterns from data by specific algorithms. Due to intuitional meaning and easy understandability, rule has now become one of major representation forms of extracted knowledge or patterns. Under this context, the result produced by data mining techniques is a set of rules, i.e., rule base.

Currently, the major challenge of data mining is not at its efficiency, but at the interpretability of discovered results. During mining stage, a considerable number of rules may be discovered when the real-world database is large. Particularly, if data is highly correlated, the situation will turn worse and quickly out of control. The huge quantity of rules makes themselves difficult to be explored, thus hampers global analysis of discovered knowledge. Furthermore, monitoring and managing of these rules are turned out to be extremely costly and difficult. The straight misfortune to users is that they can not effectively interpret or understand those overwhelming number of rules. Consequently, users may be buried within the masses of gained knowledge again, and nobody will directly benefit from the results of such data mining techniques (Berzal & Cubero, 2007). Hence, it is an urgent requisite for intelligent techniques to handle useless rules and help users to understand the results from the rapidly growing volumes of digital data.

Post-processing, whose purpose is to enhance the quality of the mined knowledge, plays a vital role in circumventing the aforementioned dilemma. The main advantage of post-processing is that it can effectively assist end-users to understand and interpret the meaning knowledge nuggets (Baesens et al., 2000). The post-processing procedure usually consists of four main steps, i.e., quality processing, summarizing, grouping and visualization. At the core of these routines, rule quality processing (e.g., pruning and filtering) is considered to be the most important one (Bruha, 2000), because this procedure can eliminate lots of noisy, redundant or insignificant rules and provide users with compact and precise knowledge derived from databases by data mining methods. From the view of end-users, a concise or condensed rule base is more preferable, because on the ground of it, decision-makers can make a quick and precise response to unseen data without being distracted by noise information.

In data mining community, many attentions have now been paid on dealing with noise knowledge through measuring similarity or redundancy. For example, distance metrics, e.g., Euclidean distance (Waitman et al., 2006), are often used to measure the similarity between rules, and those rules with high similarity will be discarded. In addition, Chi-square tests (Liu et al., 1999) and entropy (Jaroszewicz and Simovici, 2002) are addressed to analyze the distance between rules in the post-processing phase. Besides, some classifiers explore efficient data structures, such as bitmap technique (Jacquene et al., 2006) and prefix tree (Li et al., 2001), to store and retrieve rules. Moreover, various interestingness measurements, both objective and subjective, are also considered in studying the issue of rules importance (Geng and Hamilton, 2006). As a representative example, Brin et al. (1997) outlined a conviction measurement to express rule interestingness.

Complete Chapter List

Search this Book:
Reset
Editorial Advisory Board
Table of Contents
Foreword
David Bell
Acknowledgment
Yanchang Zhao, Chengqi Zhang, Longbing Cao
Chapter 1
Paul D. McNicholas, Yanchang Zhao
Association rules present one of the most versatile techniques for the analysis of binary data, with applications in areas as diverse as retail... Sample PDF
Association Rules: An Overview
$37.50
Chapter 2
Mirko Boettcher, Georg Ruß, Detlef Nauck, Rudolf Kruse
Association rule mining typically produces large numbers of rules, thereby creating a second-order data mining problem: which of the generated rules... Sample PDF
From Change Mining to Relevance Feedback: A Unified View on Assessing Rule Interestingness
$37.50
Chapter 3
Solange Oliveira Rezende, Edson Augusto Melanda, Magaly Lika Fujimoto, Roberta Akemi Sinoara, Veronica Oliveira de Carvalho
Association rule mining is a data mining task that is applied in several real problems. However, due to the huge number of association rules that... Sample PDF
Combining Data-Driven and User-Driven Evaluation Measures to Identify Interesting Rules
$37.50
Chapter 4
Julien Blanchard, Fabrice Guillet, Pascale Kuntz
Assessing rules with interestingness measures is the cornerstone of successful applications of association rule discovery. However, as numerous... Sample PDF
Semantics-Based Classification of Rule Interestingness Measures
$37.50
Chapter 5
Huawen Liu, Jigui Sun, Huijie Zhang
In data mining, rule management is getting more and more important. Usually, a large number of rules will be induced from large databases in many... Sample PDF
Post-Processing for Rule Reduction Using Closed Set
$37.50
Chapter 6
Hacène Cherfi, Amedeo Napoli, Yannick Toussaint
A text mining process using association rules generates a very large number of rules. According to experts of the domain, most of these rules... Sample PDF
A Conformity Measure Using Background Knowledge for Association Rules: Application to Text Mining
$37.50
Chapter 7
Hetal Thakkar, Barzan Mozafari, Carlo Zaniolo
The real-time (or just-on-time) requirement associated with online association rule mining implies the need to expedite the analysis and validation... Sample PDF
Continuous Post-Mining of Association Rules in a Data Stream Management System
$37.50
Chapter 8
Ronaldo Cristiano Prati
Receiver Operating Characteristics (ROC) graph is a popular way of assessing the performance of classification rules. However, as such graphs are... Sample PDF
QROC: A Variation of ROC Space to Analyze Item Set Costs/Benefits in Association Rules
$37.50
Chapter 9
Maria-Luiza Antonie, David Chodos, Osmar Zaïane
The chapter introduces the associative classifier, a classification model based on association rules, and describes the three phases of the model... Sample PDF
Variations on Associative Classifiers and Classification Results Analyses
$37.50
Chapter 10
Silvia Chiusano, Paolo Garza
In this chapter the authors make a comparative study of five well-known classification rule pruning methods with the aim of understanding their... Sample PDF
Selection of High Quality Rules in Associative Classification
$37.50
Chapter 11
Sadok Ben Yahia, Olivier Couturier, Tarek Hamrouni, Engelbert Mephu Nguifo
Providing efficient and easy-to-use graphical tools to users is a promising challenge of data mining, especially in the case of association rules.... Sample PDF
Meta-Knowledge Based Approach for an Interactive Visualization of Large Amounts of Association Rules
$37.50
Chapter 12
Claudio Haruo Yamamoto, Maria Cristina Ferreira de Oliveira, Solange Oliveira Rezende
Miners face many challenges when dealing with association rule mining tasks, such as defining proper parameters for the algorithm, handling sets of... Sample PDF
Visualization to Assist the Generation and Exploration of Association Rules
$37.50
Chapter 13
Nicolas Pasquier
After more than one decade of researches on association rule mining, efficient and scalable techniques for the discovery of relevant association... Sample PDF
Frequent Closed Itemsets Based Condensed Representations for Association Rules
$37.50
Chapter 14
Mengling Feng, Jinyan Li, Guozhu Dong, Limsoon Wong
This chapter surveys the maintenance of frequent patterns in transaction datasets. It is written to be accessible to researchers familiar with the... Sample PDF
Maintenance of Frequent Patterns: A Survey
$37.50
Chapter 15
Guozhu Dong, Jinyan Li, Guimei Liu, Limsoon Wong
This chapter considers the problem of “conditional contrast pattern mining.” It is related to contrast mining, where one considers the mining of... Sample PDF
Mining Conditional Contrast Patterns
$37.50
Chapter 16
Qinrong Feng, Duoqian Miao, Ruizhi Wang
Decision rules mining is an important technique in machine learning and data mining, it has been studied intensively during the past few years.... Sample PDF
Multidimensional Model-Based Decision Rules Mining
$37.50
About the Contributors