Article Preview
Top1. Introduction
With the immense growth in the amount of data storage, there has been a focus on effective use of Data Mining (DM) techniques to identify the information and patterns in these data storages. Research has shown how data mining techniques can help in extracting relevant information (with intelligent methods) from a data set (Alcalá-Fdez et al. 2008). This discovered information should be correct, comprehensible, relevant and interesting in order to consider the whole process of mining a successful one (Kamath & Cantu-Paz 2005).
Data classification is a classical DM technique which is used to find rules and properties that classify the data into disjoint groups (Romero & Ventura 2010). The goal of classification is to correctly predict a category class for each input data in the data. Researchers in past have used several conventional and evolutionary tools and techniques for data mining. Conventional techniques suffer from drawback of producing non optimal results as they tend to increase computational cost by producing large number of features than actually required. These methods worked on assumption of a-priori knowledge about the data set, which in most cases is not available. On the other hand, evolutionary algorithms produce much better optimal results because they are domain independent and can manage attribute interaction much better than the conventional data mining techniques (Alcalá-Fdez et al. 2008).
Evolutionary Algorithms are genetic processes that are naturally evolved optimized algorithms (Sharma 2015). In past few years, different evolutionary algorithms have been developed such as genetic algorithms (GA), backtracking search optimization algorithms, differential search algorithms, multi objective bat algorithm, hybrid particle swarn optimization algorithm etc. (Engelbrecht 2007). For pre-processing and post-processing the discovered knowledge genetic algorithms are used whereas for rule discovery and data pre-processing genetic programming is used (Freitas 2003). Evolutionary algorithms needs to be predesigned by programming expertise which requires considerable time, effort, knowledge and experience (Alcalá-Fdez et al. 2009).
KEEL is an open source java software tool. It is specifically used for classification, pattern mining, regression (Alcalá-Fdez et al. 2008). It makes efficient use of these Evolutionary Algorithms with a strict implementation of object oriented approval. It completely manages data and experiment’s design using dataset repository and number of inbuilt evolutionary algorithms. KEEL serves several features, firstly, it consists of several evolutionary algorithms well classified into different categories (Alcalá-Fdez et al. 2008, Alcalá-Fdez et al. 2009). Secondly, it includes pre-processing techniques, post processing methods, different visualization modules and so on for various purposes. Furthermore, this tool has a great advantage that extends the range of users (Alcalá-Fdez et al. 2008). It is due to user friendly interface that provides easy to use software which requires less knowledge and experience.
In this paper we undertake comparative analysis of five evolutionary algorithms using KEEL tool namely CPSO-C, SSMA-C, FURIA-C, GFS-MaxLogitBoost-C and DROP3PSO-C on eight datasets namely, bupa, ecoli, glass, haberman, iris, monk, vehicle and wine. The analysis is based on comparison of maximum and minimum efficiencies of selected algorithms on 8 datasets. Further Clas-Wilcoxin-ST (Wilcoxon 1992) is used as statistical technique for pair-wise comparison giving out positive rank, negative rank and p-value as output.
The main contribution of this work is to compare classification algorithm’s family instead of classification algorithms comparison. Further, these selected evolutionary algorithms are compared against two statistical classifiers using the Wilcoxon signed rank test and Friedman test on following datasets: bupa, ecoli, glass, haberman, iris, monks, vehicle and wine.
Section 2 discusses related studies that correspond to our work. Section 3 describes methodology used in this paper: evolutionary algorithms used in the model, datasets used, experimental setup. Section 4 illustrates functional blocks and the experimental case study with the experimental results. Finally, section 5 frames conclusions and future work.