An Efficient Method for Discretizing Continuous Attributes

An Efficient Method for Discretizing Continuous Attributes

Kelley M. Engle, Aryya Gangopadhyay
Copyright: © 2010 |Pages: 21
DOI: 10.4018/jdwm.2010040101
(Individual Articles)
No Current Special Offers


In this paper the authors present a novel method for finding optimal split points for discretization of continuous attributes. Such a method can be used in many data mining techniques for large databases. The method consists of two major steps. In the first step search space is pruned using a bisecting region method that partitions the search space and returns the point with the highest information gain based on its search. The second step consists of a hill climbing algorithm that starts with the point returned by the first step and greedily searches for an optimal point. The methods were tested using fifteen attributes from two data sets. The results show that the method reduces the number of searches drastically while identifying the optimal or near-optimal split points. On average, there was a 98% reduction in the number of information gain calculations with only 4% reduction in information gain.
Article Preview

There are a number of different methods of discretization for continuous attributes. Dougherty et al. (1995) present three ways of classifying discretization: (1) global vs. local; (2) supervised vs. unsupervised and; (3) static vs. dynamic. Alternatively, Liu et al. (2002), present a hierarchical framework to describe the various discretization methods. Their framework decomposes the methods first by merging vs. splitting and then each of those categories is further broken down into supervised vs. unsupervised.

Complete Article List

Search this Journal:
Volume 20: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing