Using Penguins Search Optimization Algorithm for Best Features Selection for Biomedical Data Classification

Using Penguins Search Optimization Algorithm for Best Features Selection for Biomedical Data Classification

Noria Bidi (Djillali Liabès University, Sidi Bel Abbès, Algeria) and Zakaria Elberrichi (Djillali Liabès University, Sidi Bel Abbès, Algeria)
DOI: 10.4018/IJOCI.2017100103
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Feature selection is essential to improve the classification effectiveness. This paper presents a new adaptive algorithm called FS-PeSOA (feature selection penguins search optimization algorithm) which is a meta-heuristic feature selection method based on “Penguins Search Optimization Algorithm” (PeSOA), it will be combined with different classifiers to find the best subset features, which achieve the highest accuracy in classification. In order to explore the feature subset candidates, the bio-inspired approach PeSOA generates during the process a trial feature subset and estimates its fitness value by using three classifiers for each case: Naive Bayes (NB), Nearest Neighbors (KNN) and Support Vector Machines (SVMs). Our proposed approach has been experimented on six well known benchmark datasets (Wisconsin Breast Cancer, Pima Diabetes, Mammographic Mass, Dermatology, Colon Tumor and Prostate Cancer data sets). Experimental results prove that the classification accuracy of FS-PeSOA is the highest and very powerful for different datasets.
Article Preview

Sensitive data such as patients’ records and body images such as tumor and surgery related information, should not be in public domains. All these data should only be within the hospital and not in any public clouds. Hence, the design and implementation of private clouds is essential for biomedical scientists to generate, process, update, archive and store their data. (Chang & Wills, 2016). Six benchmark datasets are used in this paper, where Wisconsin Breast Cancer, Pima Diabetes, Mammographic Mass, and Dermatology datasets were obtained from the UCI machine learning repository (UCI), the colon cancer and the prostate cancer datasets were taken from Kent Ridge Biomedical Data Repository. The main characteristics of these datasets are depicted in Table 1.

Table 1.
The characteristics of the used datasets
DatasetsFeaturesInstancesClassMissing Value
Wisconsin Breast cancer325692No
Pima Diabetes87682Yes
Mammographic Mass59612Yes
Dermatology333666Yes
Colon tumor2000622No
Prostate cancer12600212No

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing