Application of Metaheuristic Approaches for the Variable Selection Problem

Application of Metaheuristic Approaches for the Variable Selection Problem

Myung Soon Song, Francis J. Vasko, Yun Lu, Kyle Callaghan
Copyright: © 2022 |Pages: 22
DOI: 10.4018/IJAMC.298309
Article PDF Download
Open access articles are freely available for download

Abstract

Variable selection is an old topic from regression models. Besides many conventional approaches, some metaheuristic approaches from the realm of optimization such as GA (Genetic Algorithm) or simulated annealing have been suggested to date. These methods have a considerable advantage to deal with many problems over the classical methods, but they must control relevant fine-tuning parameters associated with cross-over or mutation, which can be difficult and time-consuming. In this paper, Jaya, one of several parameter-free approaches will be suggested and explored. Several metaheuristic methods will be compared using results from a real-world dataset and a simulated dataset. The impact of using local search will be analyzed.
Article Preview
Top

Introduction

Variable selection is a classical topic in regression which has many applications in several areas including, but not limited to, engineering, medicine, psychology, or business.

Among numerous variable selection methods developed, some classical sequential methods such as stepwise selection methods (Desboulets, 2018; Lindsey and Sheather, 2010) have been widely used because they are simple and work very well if there are not too many variables and they have low prediction error. But there are some drawbacks in these methods. Two most serious issues among them are (1) they tend to converge to local optima (Hans et al., 2012; Hocking,1976; Kiezun et al., 2009; Meiri and Zahavi, 2006; Paterini and Minerva, 2010) and (2) they do not work very well in high dimensional spaces. (Hand et al., 2012; Kapetanios, 2007). Later in this section, it will be explained how these problems can be resolved with ‘metaheuristics’ in optimization research.

The selection of the most adequate variables in regression models can be stated as a combinatorial optimization problem with the objective to select explanatory variables that maximize the adequacy of the model according to statistical criteria (objective function). (Meiri, 2006; Paterlini and Minerva, 2010) Some methods or algorithms from optimization research have been used for variable selection, including but not limited to, genetic algorithm (Broadhurst et al., 1997; Kapetanios, 2007; Kiezun et al., 2009; Jirapech-Umpai and Aitken, 2005; Mohan et al., 2018; Paterini and Minerva, 2010; Peng et al., 2005; Sinha et al., 2015), simulated annealing (Kiezun et al., 2009; Meiri and Zahavi, 2006), iterated local search (Hans et al., 2012). These methods are characterized as metaheuristics, a stochastic search strategy dedicated to solving difficult problems (NP-hard problems) in optimization research.

In particular, genetic algorithms (GA hereafter) and simulated annealing (SA hereafter) are known to be very effective to resolve the two issues mentioned above – (1) convergence to local optima (Kapatenios, 2007; Kiezun et al., 2006; Meiri, 2006; Paterini and Minerva, 2010) and (2) handling high dimensional spaces. (Kapatenious, 2007; Meiri, 2006) Brief descriptions of GA and SA can be found in Appendix.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing