Application of Machine Learning Algorithms to a Well Defined Clinical Problem: Liver Disease

Application of Machine Learning Algorithms to a Well Defined Clinical Problem: Liver Disease

Sakshi Takkar (Lovely Professional University, Phagwara, India), Aman Singh (Department of Computer Science and Engineering, Lovely Professional University, Phagwara, India) and Babita Pandey (Department of Computer Applications, Lovely Professional University, Phagwara, India)
Copyright: © 2017 |Pages: 23
DOI: 10.4018/IJEHMC.2017100103
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Liver diseases represent a major health burden worldwide. Machine learning (ML) algorithms have been extensively used to diagnose liver disease. This study accordingly aims to employ various individual and integrated ML algorithms on distinct liver disease datasets for evaluating the diagnostic performances, to integrate dimensionality reduction method with the ML algorithms for analyzing variation in results, to find the best classification model and to analyze the merits and demerits of these algorithms. KNN and PCA-KNN emerged to be the top individual and integrated models. The study also concluded that one specific algorithm can't show best results for all types of datasets and integrated models not always perform better than the individuals. It is observed that no algorithm is perfect and performance of an algorithm totally depends on the dataset type and structure, its number of observations, its dimensions and the decision boundary.
Article Preview

Introduction

Liver is largest internal organ of the body. It plays a significant role in transfer of blood throughout our body. The levels of most chemicals in our blood are regulated by the liver. It helps in metabolism of the alcohol, drugs and destroys toxic substances. Liver can be infected by parasites, viruses which cause inflammation and diminish its function (Pandey & Singh, 2014). It has the potential to maintain the customary function, even when a part of it is damaged. However, it is important to diagnose liver disease early which can increase the patient's survival rate. Expert physicians are required for various examination tests to diagnose the liver disease, but it cannot assure the correct diagnosis.

Computer-aided diagnosis is needed for correct prediction of liver disease and it also helps to deal with tremendous and cumbersome data. Research interest is growing in the field of ML and knowledge discovery in order to traverse knowledge in detailed volume. Data stored in databases contains valuable hidden knowledge which helps to enhance decision making. Supervised classification is one of the main methods to extract knowledge from databases where set of training examples are known previously (Dankerl et al., 2013; Kumar, Moni, & Rajeesh, 2013) . Actually, Classification is a dual process which consists two phases. One is Training phase where with the help of classifier algorithm, training dataset trains the classifier. The other is Testing phase where testing of classifier is done to analyze its performance using different samples of the test set. Prediction accuracy is a criterion to evaluate the performance of classifier. Classification accuracy describes the percentage of instances which are correctly classified. Various classification algorithms are there which include SVM, discriminant analysis and nearest neighbor algorithms etc.

These classification algorithms are applied on different small or large medical datasets. The task of learning from scanty datasets is an arduous task. Some datasets contain too many attributes but to select an adequate subset of attributes or features is a significant question. To select an effective subset of attributes, two dimension reduction techniques are there – one technique is to reduce the dimensions by selecting relevant features from the existing features and is known as feature selection. The other one is feature extraction where a set of new reduced features is designed based on some transformation function (Guyon & Elisseeff, 2006; Jenke, Peer, & Buss, 2014). These techniques may be supervised or unsupervised and it depends on whether they use the output information or not. One of the optimum and extensively used feature extraction methods is Principal Component Analysis (PCA). PCA is a learning that is unsupervised as it does not utilize the output information. In this number of features are decreased for effective data representation by abandoning the linear combinations that have small variances and contain only those that have large variances. This method transforms the existing or original n coordinates orthogonally into new n coordinates' set called as principal components (Bro & Smilde, 2014). As an outcome of transformation, the first principal component has the greatest possible variance. Also, there is orthogonality between each subsequent component and the pioneering component.

In most previous researches, different algorithms are applied on liver datasets to find the best algorithm for accurate diagnoses of disease. The aim of this study is to employ different classification algorithms on various liver datasets to determine the applications of algorithms, to integrate PCA approach in order to analyze the variation in results and to find the best proposed method. The remainder of this paper is structured as follows. In Section 2 previous studies on the liver disease diagnosis using classification algorithms are reviewed. Section 3 describes the detailed procedure of all algorithms. Section 4 represents the results and discussion which have been done on three liver datasets and Section 5 recapitulates the paper with brief conclusions.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 9: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing