Article Preview
Top1. Introduction
Classification is a data mining technique that maps the data into the predefined groups or classes. The decision tree is a supervised method which is used to predict group membership for data instances. The C4.5 (Quinlan, 1993) algorithm has been widely used in different areas, such as medicine, manufacturing and production, financial analysis, astronomy, and molecular biology (Han, Kamber, & Pei, 2011). This algorithm computes gain ratio, based on the Shannon entropy (Shannon, 1948), to select the attribute, which better partitioned the instances into distinct classes; then, the decision tree is built.
The word entropy finds its roots in the Greek entropia, which means “a turning toward” or “transformation.” The concept of entropy comes from a principle of thermodynamics dealing with energy, and contains information related to the general organization of a system, such as the level of disorder (Abe, 1997; Aczel & Daróczy, (1963)Havrda & Charvat, (1967) Landsberg & Vedral, 1998; Renyi, (1961)Sharma & Mittal, (1975)Sharma & Taneja, (1975)Taneja, (1975)Tsallis, (1988). Havrda-Charvat, Renyi, Sharma-Taneja, Taneja, and Tsallis entropies are well-known generalizations of the Shannon entropy and have been used into decision trees. They are parametric entropy measures; this means that the user must set the value for the parameter in order to compute the given entropy. Several researches show that, by adjusting the value of the parameter, parametric entropy-based trees outperform the classification ratio of Shannon entropy-based trees on domains such as customers, network intrusion detection, and colon tumor (Gajowniczek, Orłowski, & Ząbkowski, 2016; Gajowniczek, Ząbkowski, & Orłowski, 2015; Lima, Assis, & Souz, 2012; Lima, Assis, & Souza, 2010; Maszczyk & Duch, 2008).
The hypothesis that parametric algorithms are useful for datasets with numeric and nominal attributes, but not for mixed attributes is stated. The main goal of this article is to present a statistical analysis of the comparison of the decision trees based on Renyi (Renyi, 1961), Tsallis (Tsallis, 1988), Abe (Abe, 1997), Landsberg and Vedral (Landsberg & Vedral, 1998), and Shannon (Shannon, 1948) entropies to support such hypotesis. The comparison is carried out by considering different values of q and the following measures: classification accuracy, area under the receiver operating characteristic (ROC) curve (AURC), and number of nodes in a tree as complexity measure of the obtained model. The time complexity of decision tree induction increases exponentially with respect to tree height; thus, trees with a small number of nodes (less complex) are preferable. Also, less complex models have less risk of overfitting the data (Han et al., 2011, p. 379).
The remaining sections describe the related work, the parametric entropies, and the experimental results. Finally, the paper provides discussion, conclusions, and further topics for investigation.