An Efficient Method for Discretizing Continuous Attributes

An Efficient Method for Discretizing Continuous Attributes

Kelley M. Engle, Aryya Gangopadhyay
DOI: 10.4018/978-1-61350-474-1.ch005
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In this paper the authors present a novel method for finding optimal split points for discretization of continuous attributes. Such a method can be used in many data mining techniques for large databases. The method consists of two major steps. In the first step search space is pruned using a bisecting region method that partitions the search space and returns the point with the highest information gain based on its search. The second step consists of a hill climbing algorithm that starts with the point returned by the first step and greedily searches for an optimal point. The methods were tested using fifteen attributes from two data sets. The results show that the method reduces the number of searches drastically while identifying the optimal or near-optimal split points. On average, there was a 98% reduction in the number of information gain calculations with only 4% reduction in information gain.
Chapter Preview
Top

There are a number of different methods of discretization for continuous attributes. Dougherty et al. (1995) present three ways of classifying discretization: (1) global vs. local; (2) supervised vs. unsupervised and; (3) static vs. dynamic. Alternatively, Liu et al. (2002), present a hierarchical framework to describe the various discretization methods. Their framework decomposes the methods first by merging vs. splitting and then each of those categories is further broken down into supervised vs. unsupervised.

Complete Chapter List

Search this Book:
Reset