Heart Disease Diagnosis: A Machine Learning Approach

Heart Disease Diagnosis: A Machine Learning Approach

Siddhartha Kumar Arjaria (Rajkiya Engineering College Banda, India) and Abhishek Singh Rathore (Independent Researcher, India)
Copyright: © 2019 |Pages: 21
DOI: 10.4018/978-1-5225-7796-6.ch008

Abstract

In the modern era of information technology, machine learning algorithms are used in different domains for boosting the quality of decision making. The correct decision making about the disease diagnosis is one of the applications where these approaches are applied successfully for assisting the doctors. Correct and timely diagnosis of disease is the primary requirement of effective treatment. Today, one of the most leading causes of death is heart disease. This chapter deals with the application of different machine learning algorithms for effective heart disease diagnosis. Diagnosis through the machine learning algorithms involves the three major steps, data preprocessing, feature selection, and classification. The chapter covers the experimental study of performance of SVM, ANN, logistic regression, random forest, KNN, AdaBoost, Naive Bayes, decision tree, SGD, CN2 rule inducer approaches.
Chapter Preview
Top

Introduction

In the recent time the medical field shows tremendous improvement in diagnosis and treatment of diseases. The timely and accurately diagnosis of any disease is helpful for effective treatment. Traditionally, the examination the signs and symptoms of the patient physically by a doctor is the most common way to diagnose the disease. Wrong diagnosis, causes the high treatment cost and the risk of life. In order to improve the accuracy of disease diagnosis, nowadays in the medical field, machine learning techniques are being used. The main data mining task involves in disease diagnosis is classification; for this many algorithms have been proposed by the researchers. Presently heart disease is considered to be one of the most fatal disease in both men and women. In year 2012, World Health Organization (WHO) claims that heart disease is the reason of more than 31% global death. In the present work we will explore various machine learning methods being used for heart disease diagnosis. Few of them are Artificial Neural Network, Support vector Machine, Random Forest, Logistic Regression, K-Nearest Neighbor approach (KNN), Ada Boost, and Naive Bayes, Decision tree, SGD and CN2 rule inducer.

Heart

The heart is considered to be most important organ of the body. A healthy heart is important for healthy life the heart has four chambers that are separated by valves and divided into two halves (Figure 1, Human Heart Diagram, n.d.). There are two chambers in each half. One chamber is known as atrium and other one is ventricle. The role of the atrium is blood collection, and the role of the ventricles is to push blood out of the heart. The role of right half of the heart is pushing oxygen-poor blood to the lungs in which blood cells can gain more oxygen. Then from lungs this newly oxygen rich blood moves into the left atrium and the left ventricle. The role of the left ventricle is to push this oxygen-rich blood to the different organs of the body. For the healthy body the proper functioning of the heart is very important.

Figure 1.

­

978-1-5225-7796-6.ch008.f01

The life style along with other factors causes the malfunction of the heart and thus leads towards the heart or cardiac disease. Heart attack or stroke or cardiac attack is the most commonly used term for heart disease. There are many reasons for heart attack but high risk people share some common set of factors. These factors are summarized in Table 1.

Table 1.
Common factors showing a high risk of heart attack
S.NoAttribute NameTypeRoleValues
1Age(In Years)NumericFeature
2SexCategoricalFeature{ Male; female}
3CP(Chest Pain)CategoricalFeature{Asymptomatic ; Atypical angina, Non-anginal ; Typical angina}
4Rest SBP (resting blood pressure)NumericFeature
5Chol (Cholesterol)NumericFeature
6Fasting Blood Sugar>120CategoricalFeature{True; False}
7Rest ECGCategoricalFeature{Normal; ST-T wave abnormality ; Left ventricular hypertrophy }
8Maximum
heart rate
NumericFeature
9Exang(Exercise induced angina)CategoricalFeature{1 = Yes; 0 = No}.
10ST by exerciseNumericFeature
11Slope(The slope of ST segment)CategoricalFeature{Up sloping; Flat; Down sloping.}
12major vessels coloredNumericFeature
13ThalCategoricalFeature{Normal; Fixed defect; Reversible defect}
14Diameter narrowingCategoricalTarget{0;1}

The objective of the chapter is to study the performance of SVM, ANN, Logistic regression, Random Forest, KNN, Ada Boost, Naive Bayes, Decision Tree, SGD, CN2 rule inducer approaches. The experiments have been performed on publicly available benchmark UCI Statlog (Heart) dataset (Dua & Karra Taniskidou, 2017). The remainder of the chapter is organized as follows. Section 2 includes the work done in this area. Section 3 summarizes the widely used classification techniques and the performance of those algorithms are listed in section 4. Later future recommendation and conclusion is provided.

Key Terms in this Chapter

Deep Learning: It is class of one machine learning algorithms that can be supervised, unsupervised, or semi-supervised. It uses multiple layers of processing units for feature extraction and transformation.

Accuracy: In classification, it one of the metric to evaluate the classification model and defined as: Accuracy=(Number of correct predictions)/(Total number of predictions).

F1-Measure: For uneven class distributions, accuracy does not work well, F1-measure, another evaluation metric is used. It is harmonic average works better even if false positives and false negatives are different.

Particle Swarm Optimization: It is an optimization method that iteratively tries to improve solution based on certain measures.

Classification: It is task of classifying the data into predefined number of classes. It is a supervised approach. The tagged data is used to create classification model that will be used for classification on unknown data.

Deep Survival Analysis: In context of electronic health records, it is hierarchical generative approach to analyze the survival time of the patient.

Sphygmogram Signals: Used to measure blood pressure.

Complete Chapter List

Search this Book:
Reset