A Hybrid Approach for Feature Selection Based on Correlation Feature Selection and Genetic Algorithm

A Hybrid Approach for Feature Selection Based on Correlation Feature Selection and Genetic Algorithm

Pooja Rani, Rajneesh Kumar, Anurag Jain
Copyright: © 2022 |Pages: 17
DOI: 10.4018/IJSI.292028
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In today's world, machine learning has become a vital part of our lives. When applied to real-world applications, machine learning encounters the difficulty of high dimensional data. Unnecessary and redundant features can be found in data. The performance of classification algorithms employed in prediction is harmed by these superfluous features. The primary step in developing any decision support system is to identify critical features. In this paper, authors have proposed a hybrid feature selection method CFGA by integrating CFS (Correlation feature selection) and GA (genetic algorithm). The efficiency of proposed method is analyzed using Logistic Regression classifier on the scale of accuracy, sensitivity, specificity, precision, F-measure and execution time parameters. Proposed CFGA method is also compared to six other feature selection methods. Results demonstrate that proposed method have increased the performance of the classification system by removing irrelevant and redundant features.
Article Preview
Top

Introduction

Machine learning is an emerging trend in computer science that is being utilized to develop a variety of decision support systems in a multitude of sectors. High-dimensional data is a prevalent difficulty when decision support systems are utilized in real-world applications. High-dimensional data might add to the system's complexity and diminish its accuracy. The curse of dimensionality is another name for this issue. When decision support systems are used in real-world applications, high dimensional data is a common problem. High dimensional data can increase complexity and reduce the accuracy of the system. This problem is also known as curse of dimensionality. Feature selection method reduces the number of features. It selects relevant features removing irrelevant features. Reduced numbers of features increase the accuracy of the system. It also reduces the complexity of the system. Removing redundant and noisy features also help in decreasing computation time (Rao et al., 2019).

Feature selection methods can be categorized into three types:

  • Filter method: Filter method filters the features before applying them to the classification algorithm. It performs the ranking of the features using general characteristics of data.

The criteria used for ranking of features are independent of the machine learning classifier. This method of feature selection is fast as compared to other methods. Therefore, for larger datasets, this method is best. Ranking of features is done individually, and interaction among features is not considered, so important features may not be selected (Miao & Niu, 2016).

  • Wrapper method: The wrapper method selects features by training the model multiple times on a different subset of features and selecting the best subset. In this method, interaction among features is considered, so it ensures the selection of the most important features. It is a very complicated method. The selection of the features is an integral part of learning, so it depends upon the classification method. The problem of overfitting can occur with this method (Jain & Singh,2018).

  • Embedded method: In embedded methods, feature selection and training processes are embedded together. Features are selected while training the model. From the full set of features, different feature subsets are created. The efficiency of these feature subsets is found by training the model and evaluating this model on these features. The limitation of this method is that the features selected depend upon the machine-learning algorithm used. Therefore, the set of features will change if the training algorithm is changed (Chandrashekar & Sahin, 2014).

In this paper, authors have proposed a hybrid feature selection method CFGA by incorporating the best characteristics of CFS and GA methods. The main contribution of this paper is to propose an optimum feature selection method that can be coupled with any classifier. The main aim of the present research is to propose a hybrid feature selection method to select a smaller subset of features and obtain higher accuracy in the classification process. This paper is focused on investigating the following:

  • Whether combining CFS and GA can achieve better classification performance as compared to other feature selection methods.

  • Whether the proposed method helps in reducing the execution time required for training.

In this paper, authors have proposed a new feature selection method CFGA that is the integration of CFS and GA method. The main contribution of this paper is to propose an optimum feature selection method that can be coupled with any classifier. The remaining sections of this paper are organized as follows: Section 2 contains a literature review of related work. Methodology of proposed method is given in section 3. Experimental results are discussed in section 4. Conclusion and future scope are discussed in section 5.

Top

Various researchers have used different methods of feature selection with different datasets as given in Table 1.

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024)
Volume 11: 1 Issue (2023)
Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2021)
Volume 8: 4 Issues (2020)
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing