Robust Feature Selection Using Rough Set-Based Ant-Lion Optimizer for Data Classification

Robust Feature Selection Using Rough Set-Based Ant-Lion Optimizer for Data Classification

Ahmad Taher Azar, P. K. Nizar Banu
Copyright: © 2022 |Pages: 21
DOI: 10.4018/IJSKD.301263
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The selection of an algorithm to tackle a certain problem is a vital undertaking that necessitates both time and knowledge. Non-functional needs, such as the size, quality, and nature of the data, must frequently be taken into account. To develop a generalized machine learning model for any domain, the most relevant features must be chosen because noisy and irrelevant characteristics degrade data mining performance. However, the selection of the dominating features is still dependent on the search technique. When there are a high number of input features, stochastic optimization can be applied to the search space. In this research, we investigate the Ant Lion Optimization (ALO), a nature-inspired algorithm that mimics the hunting process of ant lions and is further stimulated to identify the smallest reducts. We also investigate Rough Set based ant lion optimizer for feature selection. The actual results reveal that the antlion-based rough set reduct selects a better feature subset and classifies them more accurately.
Article Preview
Top

1. Introduction

The dimensionality of any type of data grows as the field of computational intelligence progresses. Dealing with balanced and imbalanced data is one of the difficulties in categorization. The investigation contributes to a better understanding of the nature of variable importance in the dataset. Feature selection helps to train the model faster by reducing complexity and making it easier to grasp. When dealing with real-time data, feature selection is regarded as a vital component of pre-processing. Feature selection algorithms are also known as data selection, attribute selection, feature creation, variable selection, and instance selection. Medical and bio-science datasets necessitate the careful identification of relevant variables in order to accurately anticipate human behaviors (Remeseiro & Bolon-Canedo, 2019; Inbarani et al., 2020, 2018, 2014a,b,c,d, 2015a,b). Dimensionality reduction approaches are broadly classified as feature selection and feature extraction methods (Lee & Verleysen, 2007; Rehman et al., 2016). The feature selection approach picks a subset of existing characteristics that satisfy a certain cost function, whereas the feature extraction method develops a new set of features with a low dimensional space based on the linear or non-linear combination of existing features (Bennasar et al., 2015). Furthermore, feature selection approaches are divided between wrappers, which are classifier dependent methods, and filters, which are classifier independent methods (Bolón-Canedo et al., 2013). Wrapper-based approaches select a subset of the features and test the accuracy of a certain classifier with the features chosen. The subset with the highest accuracy is chosen for further examination; nonetheless, the fundamental downside of wrappers is overfitting to the classification model. Filter-based approaches choose a feature subset using statistical metrics, resulting in great scalability and lower processing efficiency (Chandrashekar & Sahin, 2014; Vergara & Estévez, 2014). Searching algorithms play an important part in picking the subset of features as the feature selection process attempts to discover the robust subset of characteristics with high accuracy.

Swarm intelligence is a system attribute in which the aggregate behavior of simple agents interacts locally with the environment known as the search space. It also provides a foundation for investigating communal problem solving without centralized authority. Ants, for example, can discover the quickest path between a food source and their colony, responding to changes in the environment. Swarm intelligence techniques are utilized to tackle discrete optimization issues and, to a lesser extent, challenging combinatorial problems based on the properties of real ant colonies. Because there is no heuristic approach that guides to identify the ideal minimal subset every time, this method is better suited for feature selection. Belief that swarms will locate the best feature combination in the search space. A subset of features or a single feature retrieved by the searching mechanisms is used as input to the feature selection procedures in order to assess its influence over the other subset of features. The most frequent procedures for determining the initial subset of features are sequential forward selection, sequential backward selection or elimination, and random selection. All feature selection procedures begin with either all of the characteristics or an empty set; when optimization techniques are used, the initial set of features is chosen at random.

The evaluation of the selected subset decides whether features are added or removed in following iterations. Rough set theory (Pawlak 1982) has been used successfully as a selection method to find data dependencies and reduce the number of attributes in a dataset. Hill-climbing algorithms have historically been employed to select the smallest reduct set. This research analyzes how the ant lion optimization technique, when combined with rough sets, aids in the selection of the smallest reduct set.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 13: 4 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing