Article Preview
TopI. Introduction
In recent years, a large volume of medical data is being generated in the hospital and health care institutions due to the extensive use of digital technologies. Big data analytics methods will extract a lot of useful information from this voluminous data. Javad Hassannataj Joloudari et al (2020) analyzed that Data science has significant growth by taking into the reach of big data for smart diagnose, disease avoidance, and policy-making in the medical sector. Raghupathi et al (2010) experimented predictive models built on this data will help the clinic in early diagnosis of disease, reduce cost, and improve treatment and overall clinical experience of the patients.
Cardio Vascular Disease (CVD) is the collective term to represent any form of heart-related diseases Ahmad, G, Wang et al (2019) (2019). It includes high blood pressure, coronary artery disease, peripheral artery disease, cerebrovascular disease, etc… Coronary Artery Disease (CAD) is the state of arteries carrying blood to the heart muscle is narrowing down due to plaque built in it. CAD is said to be an important killer disease in the entire universe by the World Health Organization (WHO). From a survey of the 2015 article, it is mentioned that about 110 million peoples were infected with CAD. It confronts that 17.9 million deaths, out of 31% deaths occurred in 2016 (World Health Organization, 2017). The early conclusion of CAD hazard will rapidly increase the recommended treatment protocol and enormously enlarges the recovery speed of the patients.
Mostly, the heart related diseases are identified through Electrocardiogram (ECG) tests. Any irregularities in the heart can be identified using ECG by medical experts Acharya U et al (2014) easily. But in some rare cases, the ECG also doesn’t track the exact brutality of the CAD. Another popular way of identifying heart disease is by using Angiogram. But angiogram is the invasive method and economically costlier too. The high cost of Angiogram makes it less affordable for the economically weaker section of the people. To make the diagnosis system widely applicable and economically affordable a new less complex, minimal effort and exact diagnosis model should be built with the assistance of ongoing technological advancement.
AI [ML] based prescient frameworks are being created by Tech organizations (Indo-Asian News Service, 2018; Vincent, 2018) and academic institutions along with their accomplice emergency clinics. The most popular classification techniques used are Naive Bayes, Decision Tree, Simple Logistic Regression, Support Vector Machine, Artificial Neural Networks (ANNs). The increased numbers of cataloging models were created in the form of CAD diagnosis utilizing the previously mentioned systems. Be that as it may, the vast majority are newly created data sets from UCI storehouse. Coronary illness UCI data sets Andras Janosi et al (2015) contains 14 factors where 13 are free factors and 1 dependent factor.
(i) Logistic Regression: Logistic regression (LR) is the most straightforward of the considerable number of classifiers and computes the probability value between 0 and 1 for the given input. If the probability value is 0.5 or more then it is classified as class 1 otherwise it classifies the input to another class 0. The sigmoid function is used to compute the probability value between 0 or 1. LR utilizes logic or additionally called score based on a probabilistic strategy for distinguishing the class of new input.
(1)Equation (1) depicts the sigmoid function used for computing the probability value between 0 and 1.z represents given input to the sigmoid function. If the result of the sigmoid function is within 0.5, the given input will be assigned class 0 and if the probability output is between 0.5 and 1, class 1 is assigned.