A Hybrid Feature Selection Method for Effective Data Classification in Data Mining Applications

A Hybrid Feature Selection Method for Effective Data Classification in Data Mining Applications

Ilangovan Sangaiya (K.L.N. Collegeof Engineering, Madurai, India) and A. Vincent Antony Kumar (PSNA College of Engineering and Technology, Dindigul, India)
Copyright: © 2019 |Pages: 16
DOI: 10.4018/IJGHPC.2019010101

Abstract

In data mining, people require feature selection to select relevant features and to remove unimportant irrelevant features from a original data set based on some evolution criteria. Filter and wrapper are the two methods used but here the authors have proposed a hybrid feature selection method to take advantage of both methods. The proposed method uses symmetrical uncertainty and genetic algorithms for selecting the optimal feature subset. This has been done so as to improve processing time by reducing the dimension of the data set without compromising the classification accuracy. This proposed hybrid algorithm is much faster and scales well to the data set in terms of selected features, classification accuracy and running time than most existing algorithms.
Article Preview

1. Introduction

Nowadays, across a wide variety of fields, a huge amount of data are being collected and stored in real-world databases at phenomenal rate. As the collected amount of stored information increases, the ability to understand and make use of it is not proportional. Also the users demand more shophisticated information. Therefore Feature selection in datamining helps to extract the relevant data from a huge data.Only a subset of relevant features out of all the available features is selected from the data being mined.While doing so the predictive accuracy of data mining algorithm improved by reducing dimensionality, removing irrelevant and redundant features. There are three general methods used in feature selection namely filter, wrapper and embedded. The filter approach preserves as much the relevant information as possible in the entire set of attributes without applying classification algorithm. Due to computational efficiency this method is quite popular even for large dataset with disadvantage being less computational effort and quality of selected features.Where as in wrapper method, attribute selection is done by taking classification algorithm and this is applied to selected attributes.This method selects attribute subset that is optimized for a given algorithm but it is too expensive for large dimensional data interms of computational complexity and time.Finally, in the embedded approach the advantages of both approaches are used by implementing the diverse evaluation criteria in different search phases.The embedded approach is capable to achieve accuracy of a wrapper method at the speed of filter method.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2019): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing