Recent Neuro-Fuzzy Approaches for Feature Selection and Classification

Recent Neuro-Fuzzy Approaches for Feature Selection and Classification

Heisnam Rohen Singh, Saroj Kr Biswas, Monali Bordoloi
Copyright: © 2019 |Pages: 19
DOI: 10.4018/978-1-5225-5832-3.ch001
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Classification is the task of assigning objects to one of several predefined categories. However, developing a classification system is mostly hampered by the size of data. With the increase in the dimension of data, the chance of irrelevant, redundant, and noisy features or attributes also increases. Feature selection acts as a catalyst in reducing computation time and dimensionality, enhancing prediction performance or accuracy, and curtailing irrelevant or redundant data. The neuro-fuzzy approach is used for feature selection and classification with better insight by representing knowledge in symbolic forms. The neuro-fuzzy approach combines the merits of neural network and fuzzy logic to solve many complex machine learning problems. The objective of this article is to provide a generic introduction and a recent survey to neuro-fuzzy approaches for feature selection and classification in a wide area of machine learning problems. Some of the existing neuro-fuzzy models are also applied to standard datasets to demonstrate their applicability and performance.
Chapter Preview
Top

1. Introduction

The focus of this era is not simply serving the purpose of a work but to optimize the process involved, in order to minimize time and space complexity. Machine learning algorithms in pattern recognition, image processing and data mining mainly ensure classification. These algorithms operate on a huge amount of data with multiple dimensions, from which knowledge is extracted. However, the entire dataset in hand does not always prove to be significant to each and every domain. An important concept that contributes extensively in classification and better understanding of the domain is feature selection (Kohavi and John, 1997). Feature selection is a process of selecting a subset of features from a set of features in a balanced manner, without losing most of the characteristics and identity of the original object. There are two factors that affect feature selection – irrelevant features and redundant features (Dash and Liu, 1997). Irrelevant features are those which provide no useful information in that context and redundant features are those which provide the same information as the currently selected features.

Selection of an optimal number of distinct features contributes substantially to the improvement of the performance of a classification system with lower computational effort, data visualization and improved understanding of computational models. Feature selection also reduces the running time of learning algorithm, the risk of data overfitting, dimensions of the problem and cost of future data acquisition (Guyon and Elisseeff, 2003). Thus, in order to cope up with the rapidly evolving data, many researchers have been proposing different feature selection techniques for classification tasks.

The main goals of feature selection are to select the smallest feature subset that yields the minimum generalization error, to reduce time complexity and to reduce memory and money for handling large datasets (Vergara and Este´vez, 2014). In most common scenarios, feature selection methods are used for solving classification problems or are a part of a classification problem. Many classical techniques exist for the purpose of feature selection such as Mutual Information (MI), decision tree, Bayesian network, genetic algorithm, Support Vector Machine (SVM), K-nearest neighbor (K-nn), Pearson correlation criteria, Linear Discriminant analysis (LDA), Artificial Neural Network (ANN), Fuzzy sets. The choice of using a specific algorithm is a critical step as no such best algorithm exists that fits for considering every scope and solving every problem of feature selection and classification.

The use of Mutual Information (MI) for feature selection can be found in many contributions by different researchers (Vergara and Estevez, 2014; Peng et al.2005; Grande et al., 2007; Chandrashekar and Sahin, 2014; Battiti, 1994). Mutual information provides the dependencies between variables in terms of their probabilistic density functions. However, if one among the two variables is continuous, a limited number of samples obtained after feature selection makes the computation of the integral in the continuous space a bit challenging (Peng et al. 2005). It has also been found that MI does not work efficiently in high-dimensional spaces and there exists no standard theory for MI normalization (Vergara and Estevez, 2014).

Complete Chapter List

Search this Book:
Reset