The Discovery of Interesting Nuggets Using Heuristic Techniques

The Discovery of Interesting Nuggets Using Heuristic Techniques

Beatriz de la Iglesia (University of East Anglia, UK) and Victor J. Rayward-Smith (University of East Anglia, UK)
Copyright: © 2002 |Pages: 24
DOI: 10.4018/978-1-930708-25-9.ch004
OnDemand PDF Download:


Knowledge Discovery in Databases (KDD) is an iterative and interactive process involving many steps (Debuse, de la Iglesia, Howard & Rayward-Smith, 2000). Data mining (DM) is defined as one of the steps in the KDD process. According to Fayyad, Piatetsky-Shapiro, Smyth and Uthurusamy (1996), there are various data mining tasks including: classification, clustering, regression, summarisation, dependency modeling, and change and deviation detection. However, there is a very important data mining problem identified previously by Riddle, Segal and Etzioni (1994) and very relevant in the context of commercial databases, which is not properly addressed by any of those tasks: nugget discovery. This task has also been identified as partial classification (Ali, Manganaris & Srikant, 1997). Nugget discovery can be defined as the search for relatively rare, but potentially important, patterns or anomalies relating to some pre-determined class or classes. Patterns of this type are called nuggets. This chapter will present and justify the use of heuristic algorithms, namely Genetic Algorithms (GAs), Simulated Annealing (SA) and Tabu Search (TS), on the data mining task of nugget discovery. First, the concept of nugget discovery will be introduced. Then the concept of the interest of a nugget will be discussed. The necessary properties of an interest measure for nugget discovery will be presented. This will include a partial ordering of nuggets based on those properties. Some of the existing measures for nugget discovery will be reviewed in light of the properties established, and it will be shown that they do not display the required properties. A suitable evaluation function for nugget discovery, the fitness measure, will then be discussed and justified according to the required properties.

Complete Chapter List

Search this Book:
Table of Contents
Hussein A. Abbass, Ruhul Sarker, Charles S. Newton
Chapter 1
Vladimir Estivill-Castro, Michael Houle
Distance-based clustering results in optimization problems that typically are NP-hard or NP-complete and for which only approximate solutions are... Sample PDF
Approximating Proximity to Fast and Robust Distance-Based Clustering
Chapter 2
Erick Cantu-Paz
With computers becoming more pervasive, disks becoming cheaper, and sensors becoming ubiquitous, we are collecting data at an ever-increasing pace.... Sample PDF
On the Use of Evolutionary Algorithms in Data Mining
Chapter 3
Beatriz de la Iglesia, Victor J. Rayward-Smith
Knowledge Discovery in Databases (KDD) is an iterative and interactive process involving many steps (Debuse, de la Iglesia, Howard & Rayward-Smith... Sample PDF
The Discovery of Interesting Nuggets Using Heuristic Techniques
Chapter 4
Jay T. Rodstein, Katherine S. Watters
Safety and health issues in virtual offices are part of progressive telecommuting programs. Telecommuting agreements between employers and employees... Sample PDF
From Evolution to Immune to Swarm to? A Simple Introduction to Modern Heuristics
Chapter 5
Inaki Inza, Pedro Larranaga, Basilio Sierra
Feature Subset Selection (FSS) is a well-known task of Machine Learning, Data Mining, Pattern Recognition or Text Learning paradigms. Genetic... Sample PDF
Estimation of Distribution Algorithms for Feature Subset Selection in Large Dimensionality Domains
Chapter 6
Jorge Muruzabal
Evolutionary algorithms are by now well-known and appreciated in a number of disciplines including the emerging field of data mining. In the last... Sample PDF
Towards the Cross-Fertilization of Multiple Heuristics: Evolving Teams of Local Bayesian Learners
Chapter 7
Neil Dunstan, Michael de Raadt
Sensing devices are commonly used for the detection and classification of subsurface objects, particularly for the purpose of eradicating Unexploded... Sample PDF
Evolution of Spatial Data Templates for Object Classification
Chapter 8
Peter W.H. Smith
Genetic Programming (GP) has increasingly been used as a data-mining tool. For example, it has successfully been used for decision tree induction... Sample PDF
Genetic Programming as a Data-Mining Tool
Chapter 9
Andries P. Engelbrecht, L. Schoeman, Sonja Rouwhorst
Genetic programming has recently been used successfully to extract knowledge in the form of IF-THEN rules. For these genetic programming approaches... Sample PDF
A Building Block Approach to Genetic Programming for Rule Discovery
Chapter 10
Rafael S. Parpinelli, Heitor S. Lopes, Alex A. Freitas
This work proposes an algorithm for rule discovery called Ant-Miner (Ant Colony-Based Data Miner). The goal of Ant-Miner is to extract... Sample PDF
An Ant Colony Algorithm for Classification Rule Discovery
Chapter 11
Jonathan Timmis, Thomas Knight
The immune system is highly distributed, highly adaptive, self-organising in nature, maintains a memory of past encounters and has the ability to... Sample PDF
Artificial Immune Systems: Using the Immune System as Inspiration for Data Mining
Chapter 12
Leandro Nunes de Castro, Fernando J. Von Zuben
This chapter shows that some of the basic aspects of the natural immune system discussed in the previous chapter can be used to propose a novel... Sample PDF
aiNet: An Artificial Immune Network for Data Analysis
Chapter 13
Parallel Data Mining  (pages 261-289)
David Taniar, J. Wenny Rahayu
Data mining refers to a process on nontrivial extraction of implicit, previously unknown and potential useful information (such as knowledge rules... Sample PDF
Parallel Data Mining
About the Authors