Statistical Entropy Measures in C4.5 Trees

Statistical Entropy Measures in C4.5 Trees

Aldo Ramirez Arellano (Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico City, Mexico), Juan Bory-Reyes (Escuela Superior de Ingeniería Mecánica y Eléctrica Zacatenco, Instituto Politécnico Nacional, Mexico City, Mexico) and Luis Manuel Hernandez-Simon (Escuela Superior de Ingeniería Mecánica y Eléctrica Zacatenco, Instituto Politécnico Nacional, Mexico City, Mexico)
Copyright: © 2018 |Pages: 14
DOI: 10.4018/IJDWM.2018010101

Abstract

The main goal of this article is to present a statistical study of decision tree learning algorithms based on the measures of different parametric entropies. Partial empirical evidence is presented to support the conjecture that the parameter adjusting of different entropy measures might bias the classification. Here, the receiver operating characteristic (ROC) curve analysis, precisely, the area under the ROC curve (AURC) gives the best criterion to evaluate decision trees based on parametric entropies. The authors emphasize that the improvement of the AURC relies on of the type of each dataset. The results support the hypothesis that parametric algorithms are useful for datasets with numeric and nominal, but not for mixed, attributes; thus, four hybrid approaches are proposed. The hybrid algorithm, which is based on Renyi entropy, is suitable for nominal, numeric, and mixed datasets. Moreover, it requires less time when the number of nodes is reduced, when the AURC is maintaining or increasing, thus it is preferable in large datasets.
Article Preview

1. Introduction

Classification is a data mining technique that maps the data into the predefined groups or classes. The decision tree is a supervised method which is used to predict group membership for data instances. The C4.5 (Quinlan, 1993) algorithm has been widely used in different areas, such as medicine, manufacturing and production, financial analysis, astronomy, and molecular biology (Han, Kamber, & Pei, 2011). This algorithm computes gain ratio, based on the Shannon entropy (Shannon, 1948), to select the attribute, which better partitioned the instances into distinct classes; then, the decision tree is built.

The word entropy finds its roots in the Greek entropia, which means “a turning toward” or “transformation.” The concept of entropy comes from a principle of thermodynamics dealing with energy, and contains information related to the general organization of a system, such as the level of disorder (Abe, 1997; Aczel & Daróczy, (1963)Havrda & Charvat, (1967) Landsberg & Vedral, 1998; Renyi, (1961)Sharma & Mittal, (1975)Sharma & Taneja, (1975)Taneja, (1975)Tsallis, (1988). Havrda-Charvat, Renyi, Sharma-Taneja, Taneja, and Tsallis entropies are well-known generalizations of the Shannon entropy and have been used into decision trees. They are parametric entropy measures; this means that the user must set the value for the parameter in order to compute the given entropy. Several researches show that, by adjusting the value of the parameter, parametric entropy-based trees outperform the classification ratio of Shannon entropy-based trees on domains such as customers, network intrusion detection, and colon tumor (Gajowniczek, Orłowski, & Ząbkowski, 2016; Gajowniczek, Ząbkowski, & Orłowski, 2015; Lima, Assis, & Souz, 2012; Lima, Assis, & Souza, 2010; Maszczyk & Duch, 2008).

The hypothesis that parametric algorithms are useful for datasets with numeric and nominal attributes, but not for mixed attributes is stated. The main goal of this article is to present a statistical analysis of the comparison of the decision trees based on Renyi (Renyi, 1961), Tsallis (Tsallis, 1988), Abe (Abe, 1997), Landsberg and Vedral (Landsberg & Vedral, 1998), and Shannon (Shannon, 1948) entropies to support such hypotesis. The comparison is carried out by considering different values of q and the following measures: classification accuracy, area under the receiver operating characteristic (ROC) curve (AURC), and number of nodes in a tree as complexity measure of the obtained model. The time complexity of decision tree induction increases exponentially with respect to tree height; thus, trees with a small number of nodes (less complex) are preferable. Also, less complex models have less risk of overfitting the data (Han et al., 2011, p. 379).

The remaining sections describe the related work, the parametric entropies, and the experimental results. Finally, the paper provides discussion, conclusions, and further topics for investigation.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 15: 4 Issues (2019): Forthcoming, Available for Pre-Order
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing