The Discovery of Interesting Nuggets Using Heuristic Techniques

The Discovery of Interesting Nuggets Using Heuristic Techniques

Beatriz de la Iglesia (University of East Anglia, UK) and Victor J. Rayward-Smith (University of East Anglia, UK)
Copyright: © 2002 |Pages: 24
DOI: 10.4018/978-1-930708-25-9.ch004
OnDemand PDF Download:
$37.50

Abstract

Knowledge Discovery in Databases (KDD) is an iterative and interactive process involving many steps (Debuse, de la Iglesia, Howard & Rayward-Smith, 2000). Data mining (DM) is defined as one of the steps in the KDD process. According to Fayyad, Piatetsky-Shapiro, Smyth and Uthurusamy (1996), there are various data mining tasks including: classification, clustering, regression, summarisation, dependency modeling, and change and deviation detection. However, there is a very important data mining problem identified previously by Riddle, Segal and Etzioni (1994) and very relevant in the context of commercial databases, which is not properly addressed by any of those tasks: nugget discovery. This task has also been identified as partial classification (Ali, Manganaris & Srikant, 1997). Nugget discovery can be defined as the search for relatively rare, but potentially important, patterns or anomalies relating to some pre-determined class or classes. Patterns of this type are called nuggets. This chapter will present and justify the use of heuristic algorithms, namely Genetic Algorithms (GAs), Simulated Annealing (SA) and Tabu Search (TS), on the data mining task of nugget discovery. First, the concept of nugget discovery will be introduced. Then the concept of the interest of a nugget will be discussed. The necessary properties of an interest measure for nugget discovery will be presented. This will include a partial ordering of nuggets based on those properties. Some of the existing measures for nugget discovery will be reviewed in light of the properties established, and it will be shown that they do not display the required properties. A suitable evaluation function for nugget discovery, the fitness measure, will then be discussed and justified according to the required properties.

Complete Chapter List

Search this Book:
Reset