Modified Firefly Algorithm With Chaos Theory for Feature Selection: A Predictive Model for Medical Data

Modified Firefly Algorithm With Chaos Theory for Feature Selection: A Predictive Model for Medical Data

Sujata Dash (University of Manitoba, Manitoba, Canada), Ruppa Thulasiram (University of Manitoba, Manitoba, Canada) and Parimala Thulasiraman (University of Manitoba, Manitoba, Canada)
Copyright: © 2019 |Pages: 20
DOI: 10.4018/IJSIR.2019040101

Abstract

Conventional algorithms such as gradient-based optimization methods usually struggle to deal with high-dimensional non-linear problems and often land up with local minima. Recently developed nature-inspired optimization algorithms are the best approaches for finding global solutions for combinatorial optimization problems like microarray datasets. In this article, a novel hybrid swarm intelligence-based meta-search algorithm is proposed by combining a heuristic method called conditional mutual information maximization with chaos-based firefly algorithm. The combined algorithm is computed in an iterative manner to boost the sharing of information between fireflies, enhancing the search efficiency of chaos-based firefly algorithm and reduces the computational complexities of feature selection. The meta-search model is implemented using a well-established classifier, such as support vector machine as the modeler in a wrapper approach. The chaos-based firefly algorithm increases the global search mobility of fireflies. The efficiency of the model is studied over high-dimensional disease datasets and compared with standard firefly algorithm, particle swarm optimization, and genetic algorithm in the same experimental environment to establish its superiority of feature selection over selected counterparts.
Article Preview
Top

1. Introduction

Feature Subset Selection (FSS) is an important process of selecting the input variables in a supervised classification problem. It removes irrelevant, redundant, and noisy features and enhances the performance of classifiers by selecting significant features. In addition, it reduces the number of features and thereby eliminates the limitation of classification algorithms in terms of time and space complexity. FSS is more desirable for high-dimensional datasets, such as microarray data with thousands of features, otherwise enormous number of instances will be needed to build reliable models, which is computationally quite expensive. Another type of feature selection method that is computationally cheaper is feature ranking which rank features using a selection metric, such as gain ratio, symmetric uncertainty etc., and top ranked features are selected by a pre-defined threshold value. The drawback here is that it does not deal with redundant features. Some of the widely used algorithms, which deal with redundancy includes Correlation based Feature Selection method (CFS), First Correlation based Feature selection (FCBF), and Conditional Mutual Information Maximization (CMIM) (Song et al., 2013).

Essentially, FSS combines two sub-processes; (i) the search strategy for searching the subsets, and (ii) evaluating the subsets. The search strategy could be complete/exhaustive or approximate. Theoretically, the exhaustive search guarantees optimal solution but practically does not work for combinatorial optimization problems which are NP-hard (Feige & Reichman, 2004). Branch and bound (Wang, Yan & Li, 2003), Greedy Sequential (Dash & Patra, 2014) and Best-First search are few examples of exhaustive search strategy which failed to find optimal feature sets for CO problems. However, approximate search strategy develops all possible near-optimal solutions with no assurance of finding a global optimal solution (Talbi, 2009).

For the past few decades, nature-inspired meta-heuristic algorithms have attracted all attentions of researchers for finding acceptable solutions for CO problems being used as a search strategy within a reasonable period (Dash & Patra, 2013; Ke et al., 2008; Wang et. al., 2007; Dash, 2014) and considered as reliable approximation methods. Metaheuristics are broadly categorized into two types, namely neighborhood based local search (Hoos & Stutzle, 2004), and population-based global search (Back et al., 1997). Some of the representative algorithms of neighbourhood based local search are Simulated Annealing (SA), Tabu Search (TS) and population based are Firefly Algorithm (FA) (Yang, 2010; Yang, 2009), Ant Colony Optimization (ACO) (Dorigo & Birattari, 2010), the Genetic Algorithm (GA) (Deb et al., 2002), Particle Swarm Optimization (PSO) (Du et al., 2015) etc. They find an optimal solution by generating a comparatively small number of feature subsets for evaluation. The candidate subsets are then evaluated using some pre-determined evaluation measure. Moreover, the performance of the algorithm depends on the proper setting of dependent parameters and the number of iterations. The applications of FSS are profound in various fields, such as bioinformatics, text categorization, natural language processing, web page classification and much more.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 11: 4 Issues (2020): 2 Released, 2 Forthcoming
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing