Performance Analysis of Nature-Inspired Algorithms-Based Bayesian Prediction Models for Medical Data Sets

Performance Analysis of Nature-Inspired Algorithms-Based Bayesian Prediction Models for Medical Data Sets

Amit Kumar, Bikash Kanti Sarkar
DOI: 10.4018/978-1-7998-8048-6.ch044
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Research in medical data prediction has become an important classification problem due to its domain specificity, voluminous, and class imbalanced nature. In this chapter, four well-known nature-inspired algorithms, namely genetic algorithms (GA), genetic programming (GP), particle swarm optimization (PSO), and ant colony optimization (ACO), are used for feature selection in order to enhance the classification performances of medical data using Bayesian classifier. Naïve Bayes is most widely used Bayesian classifier in automatic medical diagnostic tools. In total, 12 real-world medical domain data sets are selected from the University of California, Irvine (UCI repository) for conducting the experiment. The experimental results demonstrate that nature-inspired Bayesian model plays an effective role in undertaking medical data prediction.
Chapter Preview
Top

Introduction

Medical data prediction is one of the most challenging tasks in data mining. At the present date, data mining in medical domain greatly contributes in discovery of disease diagnosis, and provides the domain users (i.e., medical practitioners) with valuable and previously unavailable knowledge to enhance diagnosis and treatment procedures for various diseases. A number of tools have been proposed to assist medical practitioners in their clinical decisions. The trend says that these tools have widely been used in clinical diagnosis, prediction and risk forecasting for different diseases. Although, several clinical models have been introduced but each of these is suffering from one or more of the identified deficiencies as pointed out below.

  • No generalized model is designed for showing better or on an average disease prediction accuracy over all medical data sets. In other words, each of these is well-suited for a specific data set. Some literature reviews are cited here for the references (Chen, & Tan, 2012; Kensaku, Caitlin, Houlihan, Andrew, & David, 2005; Narasingarao, Manda, Sridhar, Madhu & Rao, 2009; Ye, Yang, Geng, Zhou, & Chen, 2002; Srimani, & Koti, 2014; Komorowski, & Ohrn, 1999; Shanker, 1996; Lekkas, & Mikhailov, 2010; Aslam, Zhu, & Nandi, 2013; Temurtas, Yumusak, & Temurtas, 2009).

  • Most of the present diagnostic methods are black-box models, that is, they have no explanation power in terms of understandablity of rules (Kensaku et al., 2005; Narasingarao et al., 2009; Azar, & EI-Metwally, 2013; Hall, & Frank, 2008). Consequently, the models are unable to provide the reasons underlying diagnosis to physicians; therefore, further insight are needed for those algorithms.

  • In general, each of the existing systems has deficiency for handling high dimensional, inconsistencies and vagueness (uncertainty) issues of clinical data.

  • Most of the existing approaches suffer from generating accurate rules which are highly desired by CDSS (clinical decision support systems).

  • The models are generally dependent on the hypothesis of statistical techniques.

Obviously, constructing a suitable generalized and accurate disease predictive model (model with highly accurate rules) is a complex and challenging task. Existence of missing values is also a vital problem for natural domain data sets. That is why the present study priorities in medical domain research.

In any classification problem, datasets usually consist of a large number of features. Likewise, it is true that medical data sets contain large number of features but all the features do not necessarily contribute to the classification performance. The existence of irrelevant and redundant features may hamper the classification performance. Also, adopting less number of features reduces the construction time of any learning model. Further, in diagnosis point of view, using less number of excellent features assists greatly the medical professionals. Obviously, good feature selection scheme is the essential solution in this respect.

Complete Chapter List

Search this Book:
Reset