A Hybrid Predictive Model Integrating C4.5 and Decision Table Classifiers for Medical Data Sets

A Hybrid Predictive Model Integrating C4.5 and Decision Table Classifiers for Medical Data Sets

Amit Kumar, Bikash Kanti Sarkar
DOI: 10.4018/978-1-7998-9023-2.ch016
(Individual Chapters)
No Current Special Offers


This article describes how, recently, data mining has been in great use for extracting meaningful patterns from medical domain data sets, and these patterns are then applied for clinical diagnosis. Truly, any accurate, precise and reliable classification models significantly assist the medical practitioners to improve diagnosis, prognosis and treatment processes of individual diseases. However, numerous intelligent models have been proposed in this respect but still they have several drawbacks like, disease specificity, class imbalance, conflicting and lack adequacy for dimensionality of patient's data. The present study has attempted to design a hybrid prediction model for medical domain data sets by combining the decision tree based classifier (mainly C4.5) and the decision table based classifier (DT). The experimental results validate in favour of the claims.
Chapter Preview


Designing automated intelligent models is a growing need from data, as the amount of data stored in databases increases in rapid manner and the number of human data analysts grows at a much smaller rate than the amount of stored data. Machine learning, a field of data mining, is an excellent process for designing such models. It has capability to discover insightful, interesting and novel patterns which are descriptive, understandable and predicative from large amount of data. In particular, it is an important phase of knowledge discovery from databases. A number of machine learning and knowledge discovery techniques have been developed for inducing rules, and are being used in various disciplines. Some of the widely used techniques are decision trees, neural networks, rough sets, decision tables, RIPPER (Repeated Incremental Pruning to Produce Error Reduction) and naïve Bayes. Truly speaking, each of these has some merits and demerits. Surely, no method is well-suited for all data sets.

At the present date, data mining techniques are being used for clinical diagnosis. It is true that medical data contains huge volume of information in an unstructured format. Further, they are by nature highly imbalanced, conflict, incomplete and vagueness. So, making diagnosis decision for the physician on the basis of only the current specifications of a patient’s data (without referring to previous decisions with similar symptoms) is a complex task. Designing automated system is the essential solution for this purpose.

A number of automated clinical decision support systems (CDSS) have been modelled to assist physicians in making decisions. Few are cited here. A systematic review on the effects of clinical decision support systems on practitioner performance and patient outcomes was published (Garg, Adhikari, McDonald, Rosas- Arellano, Devereaux, & Beyene, 2005). Few years later, again a systematic review and meta-analysis was published on the effectiveness of CDSS linked to electronic health records (Moja, Kwag, Lytras, Bertizzolo, Brandt, Pecoraro, Rigon, Vaona, Ruggiero, Mangia, Iorio, Kunnamo, & Bonovas, 2014). A survey and future directions on modelling paradigms for medical diagnostic decision support systems was presented (Wagholikar, Sundararajan, & Deshpande, 2012). A plenary talk was delivered on the role of machine learning in clinical decision support (Tanveer Syeda-Mahmood, 2015). Using multilayer perceptron neural network a clinical decision support system was developed to detect well-being diabetes patients (Narasingarao, Manda, Sridhar, Madhu, & Rao, 2009). For improving the prediction rate of diabetes diagnosis, a case based approach was investigated using fuzzy and neural network (Thirugnanam, Kumar, Srivatsan, & Nerlesh, 2012). A study on the prediction of type 2 diabetes was done on several elementary levels in blood and chemo metrics (Chen & Tan, 2012). It is interesting to note that biomedical ontologies contain a vast amount of clinical knowledge. It is a challenging task to build and maintain biomedical ontologies from natural language texts. In this purpose, a methodology was discovered for a general solution to minimize the expert participation during ontology enriching process (Medina-Moreira, Luna-Aveiga, Apolinario-Arzube, Salas-Zarate & Valencia-Garcia, 2017). Recently, an intelligent content based dermoscopic image retrieval system is developed for melanoma diagnosis (Belattar, Mostefai & Draa, 2017). Hence, these systems are widely used for diagnosis, prediction, classification and risk forecasting of various diseases on the basis of electronic medical record (EMR) data. Although, several clinical models have been introduced but each of these is suffering from one or more of the identified deficiencies as pointed out below.

Complete Chapter List

Search this Book: