Article Preview
TopIntroduction
Machine learning algorithms in pattern recognition, image processing and data mining mainly ensure classification and clustering. These algorithms operate on a huge amount of data with multiple dimensions. Most of these data are insignificant to a specific domain. An important concept that helps in classification, clustering and better understanding of the domain is feature selection (Kohavi and George 1997). Feature selection is a process of selecting a subset of features from a set of features without losing the characteristics and identity of the original object. There are two factors that affect feature selection – irrelevant features and redundant features. Irrelevant features are those features which provide no useful information in a context. Redundant features are those features which provide no more information than the currently selected features.
Feature selection has been proved an inevitable part of a classifier through numerous researches. In real-world scenario, to better represent the domain, many candidate features are introduced, which result in the existence of irrelevant/redundant features to the target concept (Dash and Liu 1997). In many classification problems, due to the huge size of the data it is difficult to build good classifiers before removing these unwanted features. Reducing the number of irrelevant/redundant features can drastically abate the running time of the learning algorithms and yields a more general classifier. Feature selection provides us with the advantages of facilitating data visualization and data understanding, reducing training and utilization times, reducing the measurement and storage requirements, defying the curse of dimensionality; which aids in the elevation of classification performance (Guyon and Elisseeff 2003).
Feature selection can be done using various techniques like mutual information (Battiti 1994; Chandrashekar and Sahin 2014), genetic algorithm (Chandrashekar and Sahin 2014; Sun, Babbs and Delp 2005; Puch, Goodman, Pei, Chia-Shun, Hovland and Enbody 1993), Bayesian network (Inza, larranaga and Sierra 2001), Artificial Neural Networks (ANNs) (Ledesma, Cerda, Avina, Hernandez and Torres 2008) etc. All these techniques have certain limitations. It is hard to calculate mutual information between the features that have continuous values. In Bayesian network, the number of structure super-exponentially increases as number of features increase and more focus on the dependency of the features rather than the important features. In genetic algorithm, some kind of randomness is involved and is very hard to assign more importance to more significant feature. Among these techniques ANN is mostly used for feature selection and classification. They are well-known massively parallel computing models, which exhibit excellent behavior in input-output mapping and resolving complex artificial intelligence problems in classification tasks.