A Distributed Algorithm for Mining Fuzzy Association Rules in Traditional Databases

A Distributed Algorithm for Mining Fuzzy Association Rules in Traditional Databases

Wai-Ho Au (Microsoft Corporation, USA)
Copyright: © 2008 |Pages: 21
DOI: 10.4018/978-1-59904-853-6.ch027
OnDemand PDF Download:
$37.50

Abstract

The mining of fuzzy association rules has been proposed in the literature recently. Many of the ensuing algorithms are developed to make use of only a single processor or machine. They can be further enhanced by taking advantage of the scalability of parallel or distributed computer systems. The increasing ability to collect data and the resulting huge data volume make the exploitation of parallel or distributed systems become more and more important to the success of fuzzy association rule mining algorithms. This chapter proposes a new distributed algorithm, called DFARM, for mining fuzzy association rules from very large databases. Unlike many existing algorithms that adopt the support-confidence framework such that an association is considered interesting if it satisfies some user-specified minimum percentage thresholds, DFARM embraces an objective measure to distinguish interesting associations from uninteresting ones. This measure is defined as a function of the difference in the actual and the expected number of tuples characterized by different linguistic variables (attributes) and linguistic terms (attribute values). Given a database, DFARM first divides it into several horizontal partitions and assigns them to different sites in a distributed system. It then has each site scan its own database partition to obtain the number of tuples characterized by different linguistic variables and linguistic terms (i.e., the local counts), and exchange the local counts with all the other sites to find the global counts. Based on the global counts, the values of the interestingness measure are computed, and the sites can uncover interesting associations. By repeating this process of counting, exchanging counts, and calculating the interestingness measure, it unveils the underlying interesting associations hidden in the data. We implemented DFARM in a distributed system and used a popular benchmark data set to evaluate its performance. The results show that it has very good size-up, speedup, and scale-up performance. We also evaluated the effectiveness of the proposed interestingness measure on two synthetic data sets. The experimental results show that it is very effective in differentiating between interesting and uninteresting associations.

Key Terms in this Chapter

Fuzzy Partitioning: It is a methodology for generating fuzzy sets to represent the underlying data. Fuzzy partitioning techniques can be classified into three categories: grid partitioning, tree partitioning, and scatter partitioning. Of the different fuzzy partitioning methods, grid partitioning is the most commonly used in practice, particularly in system control applications. Grid partitioning forms a partition by dividing the input space into several fuzzy slices, each of which is specified by a membership function for each feature dimension.

Negative Association Rule: A negative association rule’s antecedent and consequent show a negative association. If its antecedent is satisfied, it is unlikely that its consequent will be satisfied.

Fuzzy Association Rule: A fuzzy association rule involves linguistic terms (fuzzy sets) in its antecedent and/or consequent.

Positive Association Rule: A positive association rule’s antecedent and consequent show a positive association. If its antecedent is satisfied, it is likely that its consequent will be satisfied.

Adjusted Residual: It is a statistic defined as a function of the difference in the actual and the expected number of tuples characterized by different linguistic variables (attributes) and linguistic terms (attribute values). It can be used as an interestingness measure.

Interestingness Measure: An interestingness measure represents how interesting an association is. The support is an example of interestingness measures.

Associative Classification: It is a classification method based on association rules. An association rule with the class label as its consequent provides a clue that a tuple satisfying its antecedent belongs to a specific class. It can therefore be used as the basis of classification.

Complete Chapter List

Search this Book:
Reset
Editorial Advisory Board
Program Committee
Table of Contents
Foreword
Maria Amparo Vila, Miguel Delgado
Preface
José Galindo
Acknowledgment
Chapter 1
José Galindo
This chapter presents an introduction to fuzzy logic and to fuzzy databases. With regard to the first topic, we have introduced the main concepts in... Sample PDF
Introduction and Trends to Fuzzy Logic and Fuzzy Databases
$37.50
Chapter 2
Slawomir Zadrozny, Guy de Tré, Rita de Caluwe, Janusz Kacprzyk
In reality, a lot of information is available only in an imperfect form. This might be due to imprecision, vagueness, uncertainty, incompleteness... Sample PDF
An Overview of Fuzzy Approaches to Flexible Database Querying
$37.50
Chapter 3
Balazs Feil, Janos Abonyi
This chapter aims to give a comprehensive view about the links between fuzzy logic and data mining. It will be shown that knowledge extracted from... Sample PDF
Introduction to Fuzzy Data Mining Methods
$37.50
Chapter 4
Didier Dubois, Henri Prade
The chapter advocates the interest of distinguishing between negative and positive preferences in the processing of flexible queries. Negative... Sample PDF
Handling Bipolar Queries in Fuzzy Information Processing
$37.50
Chapter 5
Noureddine Mouaddib, Guillaume Raschia, W. Amenel Voglozin, Laurent Ughetto
This chapter presents a discussion on fuzzy querying. It deals with the whole process of fuzzy querying, from the query formulation to its... Sample PDF
From User Requirements to Evaluation Strategies of Flexible Queries in Databases
$37.50
Chapter 6
P Bosc, A Hadjali, O Pivert
The idea of extending the usual Boolean queries with preferences has become a hot topic in the database community. One of the advantages of this... Sample PDF
On the Versatility of Fuzzy Sets for Modeling Flexible Queries
$37.50
Chapter 7
Guy De Tré, Marysa Demoor, Bert Callens, Lise Gosseye
In case-based reasoning (CBR), a new untreated case is compared to cases that have been treated earlier, after which data from the similar cases (if... Sample PDF
Flexible Querying Techniques Based on CBR
$37.50
Chapter 8
Bordogna Bordogna, Guiseppe Psaila
In this chapter, we present the Soft-SQL project whose goal is to define a rich extension of SQL aimed at effectively exploiting flexibility offered... Sample PDF
Customizable Flexible Querying in Classical Relational Databases
$37.50
Chapter 9
Cornelia Tudorie
The topic presented in this chapter refers to qualifying objects in some kinds of vague queries sent to relational databases. We want to compute a... Sample PDF
Qualifying Objects in Classical Relational Database Querying
$37.50
Chapter 10
Ludovic Liétard, Daniel Rocacher
This chapter is devoted to the evaluation of quantified statements which can be found in many applications as decision making, expert systems, or... Sample PDF
Evaluation of Quantified Statements Using Gradual Numbers
$37.50
Chapter 11
Angélica Urrutia, Leonid Tineo, Claudia Gonzalez
Actually, FSQL and SQLf are the main fuzzy logic based proposed extensions to SQL. It would be very interesting to integrate them with a standard... Sample PDF
FSQL and SQLf: Towards a Standard in Fuzzy Databases
$37.50
Chapter 12
Rallou Thomopoulos, Patrice Buche, Ollivier Haemmerlé
Within the framework of flexible querying of possibilistic databases, based on the fuzzy set theory, this chapter focuses on the case where the... Sample PDF
Hierarchical Fuzzy Sets to Query Possibilistic Databases
$37.50
Chapter 13
Troels Andreasen, Henrik Bulskov
The use of taxonomies and ontologies as a foundation for enhancing textual information base access has recently gained increased attention in the... Sample PDF
Query Expansion by Taxonomy
$37.50
Chapter 14
Mohamed Ali Ben Hassine, Amel Grissa Touzi, José Galindo, Habib Ounelli
Fuzzy relational databases have been introduced to deal with uncertain or incomplete information demonstrating the efficiency of processing fuzzy... Sample PDF
How to Achieve Fuzzy Relational Databases Managing Fuzzy Data and Metadata
$37.50
Chapter 15
Geraldo Xexéo, André Braga
We present CLOUDS, which stands for C++ Library Organizing Uncertainty in Database Systems, a tool that allows the creation of fuzzy reasoning... Sample PDF
A Tool for Fuzzy Reasoning and Querying
$37.50
Chapter 16
Aleksandar Takaci, Srdan Škrbic
This chapter introduces a way to extend the relational model with mechanisms that can handle imprecise, uncertain, and inconsistent attribute values... Sample PDF
Data Model of FRDB with Different Data Types and PFSQL
$37.50
Chapter 17
Carlos D. Barranco, Jesús R. Campaña, Juan M. Medina
This chapter introduces a fuzzy object-relational database model including fuzzy extensions of the basic object-relational databases constructs, the... Sample PDF
Towards a Fuzzy Object-Relational Database Model
$37.50
Chapter 18
Radim Belohlavek
Formal concept analysis is a particular method of analysis of relational data. Also, formal concept analysis provides elaborate mathematical... Sample PDF
Relational Data,Formal Concept Analysis, and Graded Attributes
$37.50
Chapter 19
Markus Schneider
Spatial database systems and geographical information systems are currently only able to support geographical applications that deal with crisp... Sample PDF
Fuzzy Spatial Data Types for Spatial Uncertainty Management in Databases
$37.50
Chapter 20
Yauheni Veryha, Jean-Yves Blot, Joao Coelho
There are many well-known applications of fuzzy sets theory in various fields of science and technology. However, we think that the area of maritime... Sample PDF
Fuzzy Classification in Shipwreck Scatter Analysis
$37.50
Chapter 21
Yan Chen, Graham H. Rong, Jianhua Chen
A Web-based fabric database is introduced in terms of its physical structure, software system architecture, basic and intelligent search engines... Sample PDF
Fabric Database and Fuzzy Logic Models for Evaluating Fabric Performance
$37.50
Chapter 22
R. A. Carrasco, F. Araque, A. Salguero, M. A. Vila
Soaring is a recreational activity and a competitive sport where individuals fly un-powered aircrafts known as gliders. The soaring location... Sample PDF
Applying Fuzzy Data Mining to Tourism Area
$37.50
Chapter 23
Andreas Meier, Günter Schindler, Nicolas Werro
In practice, information systems are based on very large data collections mostly stored in relational databases. As a result of information... Sample PDF
Fuzzy Classification on Relational Databases
$37.50
Chapter 24
Shyue-Liang Wang, Ju-Wen Shen, Tuzng-Pei Hong
Mining functional dependencies (FDs) from databases has been identified as an important database analysis technique. It has received considerable... Sample PDF
Incremental Discovery of Fuzzy Functional Dependencies
$37.50
Chapter 25
Radim Belohlavek, Vilem Vychodil
This chapter deals with data dependencies in Codd’s relational model of data. In particular, we deal with fuzzy logic extensions of the relational... Sample PDF
Data Dependencies in Codd's Relational Model with Similarities
$37.50
Chapter 26
Awadhesh Kumar Sharma, A. Goswami, D. K. Gupta
In this chapter, the concept of fuzzy inclusion dependencies (FIDas) in fuzzy databases is introduced and inference rules on such FIDas are derived.... Sample PDF
Fuzzy Inclusion Dependencies in Fuzzy Databases
$37.50
Chapter 27
Wai-Ho Au
The mining of fuzzy association rules has been proposed in the literature recently. Many of the ensuing algorithms are developed to make use of only... Sample PDF
A Distributed Algorithm for Mining Fuzzy Association Rules in Traditional Databases
$37.50
Chapter 28
Yi Wang
This chapter applies fuzzy logic to a dynamic causal mining (DCM) algorithm and argues that DCM, a combination of association mining and system... Sample PDF
Applying Fuzzy Logic in Dynamic Causal Mining
$37.50
Chapter 29
Céline Fiot
The explosive growth of collected and stored data has generated a need for new techniques transforming these large amounts of data into useful... Sample PDF
Fuzzy Sequential Patterns for Quantitative Data Mining
$37.50
Chapter 30
Hamid Haidarian Shahri
Entity resolution (also known as duplicate elimination) is an important part of the data cleaning process, especially in data integration and... Sample PDF
A Machine Learning Approach to Data Cleaning in Databases and Data Warehouses
$37.50
Chapter 31
Malcolm Beynon
The general fuzzy decision tree approach encapsulates the benefits of being an inductive learning technique to classify objects, utilising the... Sample PDF
Fuzzy Decision-Tree-Based Analysis of Databases
$37.50
Chapter 32
Malcolm Beynon
Outranking methods are a family of techniques concerned with ranking the preference for alternatives based on the criteria values that describe... Sample PDF
Fuzzy Outranking Methods Including Fuzzy PROMETHEE
$37.50
Chapter 33
J. I. Peláez, J. M. Doña, D. La Red
Missing data is often an actual problem in real data sets, and different imputation techniques are normally used to alleviate this problem.... Sample PDF
Fuzzy Imputation Method for Database Systems
$37.50
Chapter 34
Safìye Turgay
In this chapter, an agent-based fuzzy data mining structure was developed to process and evaluate data with an enlargement in the knowledge... Sample PDF
Intelligent Fuzzy Database Management Systems
$37.50
About the Editor
About the Contributors