Abstract
In the modern era of information technology, machine learning algorithms are used in different domains for boosting the quality of decision making. The correct decision making about the disease diagnosis is one of the applications where these approaches are applied successfully for assisting the doctors. Correct and timely diagnosis of disease is the primary requirement of effective treatment. Today, one of the most leading causes of death is heart disease. This chapter deals with the application of different machine learning algorithms for effective heart disease diagnosis. Diagnosis through the machine learning algorithms involves the three major steps, data preprocessing, feature selection, and classification. The chapter covers the experimental study of performance of SVM, ANN, logistic regression, random forest, KNN, AdaBoost, Naive Bayes, decision tree, SGD, CN2 rule inducer approaches.
TopIntroduction
In the recent time the medical field shows tremendous improvement in diagnosis and treatment of diseases. The timely and accurately diagnosis of any disease is helpful for effective treatment. Traditionally, the examination the signs and symptoms of the patient physically by a doctor is the most common way to diagnose the disease. Wrong diagnosis, causes the high treatment cost and the risk of life. In order to improve the accuracy of disease diagnosis, nowadays in the medical field, machine learning techniques are being used. The main data mining task involves in disease diagnosis is classification; for this many algorithms have been proposed by the researchers. Presently heart disease is considered to be one of the most fatal disease in both men and women. In year 2012, World Health Organization (WHO) claims that heart disease is the reason of more than 31% global death. In the present work we will explore various machine learning methods being used for heart disease diagnosis. Few of them are Artificial Neural Network, Support vector Machine, Random Forest, Logistic Regression, K-Nearest Neighbor approach (KNN), Ada Boost, and Naive Bayes, Decision tree, SGD and CN2 rule inducer.
Heart
The heart is considered to be most important organ of the body. A healthy heart is important for healthy life the heart has four chambers that are separated by valves and divided into two halves (Figure 1, Human Heart Diagram, n.d.). There are two chambers in each half. One chamber is known as atrium and other one is ventricle. The role of the atrium is blood collection, and the role of the ventricles is to push blood out of the heart. The role of right half of the heart is pushing oxygen-poor blood to the lungs in which blood cells can gain more oxygen. Then from lungs this newly oxygen rich blood moves into the left atrium and the left ventricle. The role of the left ventricle is to push this oxygen-rich blood to the different organs of the body. For the healthy body the proper functioning of the heart is very important.
The life style along with other factors causes the malfunction of the heart and thus leads towards the heart or cardiac disease. Heart attack or stroke or cardiac attack is the most commonly used term for heart disease. There are many reasons for heart attack but high risk people share some common set of factors. These factors are summarized in Table 1.
Table 1. Common factors showing a high risk of heart attack
S.No | Attribute Name | Type | Role | Values |
1 | Age(In Years) | Numeric | Feature | |
2 | Sex | Categorical | Feature | { Male; female} |
3 | CP(Chest Pain) | Categorical | Feature | {Asymptomatic ; Atypical angina, Non-anginal ; Typical angina} |
4 | Rest SBP (resting blood pressure) | Numeric | Feature | |
5 | Chol (Cholesterol) | Numeric | Feature | |
6 | Fasting Blood Sugar>120 | Categorical | Feature | {True; False} |
7 | Rest ECG | Categorical | Feature | {Normal; ST-T wave abnormality ; Left ventricular hypertrophy } |
8 | Maximum heart rate | Numeric | Feature | |
9 | Exang(Exercise induced angina) | Categorical | Feature | {1 = Yes; 0 = No}. |
10 | ST by exercise | Numeric | Feature | |
11 | Slope(The slope of ST segment) | Categorical | Feature | {Up sloping; Flat; Down sloping.} |
12 | major vessels colored | Numeric | Feature | |
13 | Thal | Categorical | Feature | {Normal; Fixed defect; Reversible defect} |
14 | Diameter narrowing | Categorical | Target | {0;1} |
The objective of the chapter is to study the performance of SVM, ANN, Logistic regression, Random Forest, KNN, Ada Boost, Naive Bayes, Decision Tree, SGD, CN2 rule inducer approaches. The experiments have been performed on publicly available benchmark UCI Statlog (Heart) dataset (Dua & Karra Taniskidou, 2017). The remainder of the chapter is organized as follows. Section 2 includes the work done in this area. Section 3 summarizes the widely used classification techniques and the performance of those algorithms are listed in section 4. Later future recommendation and conclusion is provided.
Key Terms in this Chapter
Deep Learning: It is class of one machine learning algorithms that can be supervised, unsupervised, or semi-supervised. It uses multiple layers of processing units for feature extraction and transformation.
Accuracy: In classification, it one of the metric to evaluate the classification model and defined as: Accuracy=(Number of correct predictions)/(Total number of predictions).
F1-Measure: For uneven class distributions, accuracy does not work well, F1-measure, another evaluation metric is used. It is harmonic average works better even if false positives and false negatives are different.
Particle Swarm Optimization: It is an optimization method that iteratively tries to improve solution based on certain measures.
Classification: It is task of classifying the data into predefined number of classes. It is a supervised approach. The tagged data is used to create classification model that will be used for classification on unknown data.
Deep Survival Analysis: In context of electronic health records, it is hierarchical generative approach to analyze the survival time of the patient.
Sphygmogram Signals: Used to measure blood pressure.