A Comprehensive Feature Selection Approach for Machine Learning

A Comprehensive Feature Selection Approach for Machine Learning

Sumit Das, Manas Kumar Sanyal, Debamoy Datta
Copyright: © 2021 |Pages: 14
DOI: 10.4018/IJDAI.2021070102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In machine learning, it is required that the underlying important input variables are known or else the value of the predicted outcome variable would never match the value of the target outcome variable. Machine learning tools are used in many applications where the underlying scientific model is inadequate. Unfortunately, making any kind of mathematical relationship is difficult, and as a result, incorporation of variables during the training becomes a big issue as it affects the accuracy of results. Another important issue is to find the cause behind the phenomena and the major factor that affects the outcome variable. The aim of this article is to focus on developing an approach that is not particular-tool specific, but it gives accurate results under all circumstances. This paper proposes a model that filters out the irrelevant variables irrespective of the type of dataset that the researcher can use. This approach provides parameters for determining the quality of the data used for mining purposes.
Article Preview
Top

2 Literature Survey

Any typical machine-learning algorithm involves extracting the output from a given set of input variables. These variables are also called as features in the input space, the input space can be represented as a vector in IJDAI.2021070102.m01The algorithms are developed assuming a functional relationship between the input vector and the output variable. IJDAI.2021070102.m02. This assumption may not always be true; there may be some variables that may not have any influence on the output variables, to filter such variables numerous algorithms for ranking such variables are proposed. In a paper by Alain Rakotomamon new methods for variable selection are proposed for Support Vector Machines. Initial developments came from Guyon & Elisseeff (Guyon & Elisseeff, 2003). This paper contained an algorithm for selecting genes that are relevant for cancer classification problem. His goal was to find a subset of size r among d variables that maximizes the performance of predictor. The criterion IJDAI.2021070102.m03is used where: IJDAI.2021070102.m04; ɸ is the kernel used in svm , w and b are the parameters for a particular model. The ranking criterion that has been used is IJDAI.2021070102.m05 his algorithm runs in linear time.

Similarly other methods for variable selection has been proposed for neural networks also, the starting development in this area was done by Garson(Beck, 2018) which was later modified by Goh (Goh, 1995) to rank the variable importance, there a simple equation that is based on connection weights were proposed, Qik determined the relative importance of i-th input on k-th output. However, the main disadvantage of the Garson’s algorithm was it used absolute values of weights that sometimes led to erroneous results. This disadvantage was removed by Olden(Olden et al., 2004). In a latest paper by Liu and Zaho (Liu & Zhao, 2017) a variable importance weighted random forest is used for classification and regression, the problem was the performance of random forest fall down when the number of features increased. In another paper by Kvalheim et al (Kvalheim et al., 2014) variable importance in latent variable regression model is proposed they presented some new graphical tools for improved interpretation of latent variable regression models that could assist in variable selection. Therefore, for different AI tools you have different algorithms. In this paper, present a novel algorithm that runs for any type of tool someone is using and provides the criterion to decide whether the results are meaningful.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 14: 2 Issues (2022)
Volume 13: 2 Issues (2021)
Volume 12: 2 Issues (2020)
Volume 11: 2 Issues (2019)
Volume 10: 2 Issues (2018)
View Complete Journal Contents Listing