A Comprehensive Performance Analysis of Various Classifier Models for Coronary Artery Disease Prediction

A Comprehensive Performance Analysis of Various Classifier Models for Coronary Artery Disease Prediction

Baranidharan Balakrishnan, Vinoth Kumar C. N. S.
DOI: 10.4018/IJCINI.20211001.oa36
Article PDF Download
Open access articles are freely available for download

Abstract

Cardio Vascular Diseases (CVD) is the major reason for the death of the majority of the people in the world. Earlier diagnosis of disease will reduce the mortality rate. Machine learning (ML) algorithms are giving promising results in the disease diagnosis and it is now widely accepted by medical experts as their clinical decision support system. In this work, the most popular ML models are investigated and compared with one other for heart disease prediction based on various metrics. The base classifiers such as Support Vector Machine (SVM), Logistic regression, Naïve Bayes, Decision Tree, K Nearest Neighbour are used for predicting heart disease. In this paper, bagging and boosting techniques are applied over these individual classifiers to improve the performance of the system. With the Cleveland and Statlog datasets, Naive Bayes as the individual classifier gives the maximum accuracy of 85.13%and 84.81% respectively. Bagging technique improves the accuracy of the decision tree which is identified as a weak classifier by 7% and it is a significant improvement in identifying CVD.
Article Preview
Top

I. Introduction

In recent years, a large volume of medical data is being generated in the hospital and health care institutions due to the extensive use of digital technologies. Big data analytics methods will extract a lot of useful information from this voluminous data. Javad Hassannataj Joloudari et al (2020) analyzed that Data science has significant growth by taking into the reach of big data for smart diagnose, disease avoidance, and policy-making in the medical sector. Raghupathi et al (2010) experimented predictive models built on this data will help the clinic in early diagnosis of disease, reduce cost, and improve treatment and overall clinical experience of the patients.

Cardio Vascular Disease (CVD) is the collective term to represent any form of heart-related diseases Ahmad, G, Wang et al (2019) (2019). It includes high blood pressure, coronary artery disease, peripheral artery disease, cerebrovascular disease, etc… Coronary Artery Disease (CAD) is the state of arteries carrying blood to the heart muscle is narrowing down due to plaque built in it. CAD is said to be an important killer disease in the entire universe by the World Health Organization (WHO). From a survey of the 2015 article, it is mentioned that about 110 million peoples were infected with CAD. It confronts that 17.9 million deaths, out of 31% deaths occurred in 2016 (World Health Organization, 2017). The early conclusion of CAD hazard will rapidly increase the recommended treatment protocol and enormously enlarges the recovery speed of the patients.

Mostly, the heart related diseases are identified through Electrocardiogram (ECG) tests. Any irregularities in the heart can be identified using ECG by medical experts Acharya U et al (2014) easily. But in some rare cases, the ECG also doesn’t track the exact brutality of the CAD. Another popular way of identifying heart disease is by using Angiogram. But angiogram is the invasive method and economically costlier too. The high cost of Angiogram makes it less affordable for the economically weaker section of the people. To make the diagnosis system widely applicable and economically affordable a new less complex, minimal effort and exact diagnosis model should be built with the assistance of ongoing technological advancement.

AI [ML] based prescient frameworks are being created by Tech organizations (Indo-Asian News Service, 2018; Vincent, 2018) and academic institutions along with their accomplice emergency clinics. The most popular classification techniques used are Naive Bayes, Decision Tree, Simple Logistic Regression, Support Vector Machine, Artificial Neural Networks (ANNs). The increased numbers of cataloging models were created in the form of CAD diagnosis utilizing the previously mentioned systems. Be that as it may, the vast majority are newly created data sets from UCI storehouse. Coronary illness UCI data sets Andras Janosi et al (2015) contains 14 factors where 13 are free factors and 1 dependent factor.

(i) Logistic Regression: Logistic regression (LR) is the most straightforward of the considerable number of classifiers and computes the probability value between 0 and 1 for the given input. If the probability value is 0.5 or more then it is classified as class 1 otherwise it classifies the input to another class 0. The sigmoid function is used to compute the probability value between 0 or 1. LR utilizes logic or additionally called score based on a probabilistic strategy for distinguishing the class of new input.

IJCINI.20211001.oa36.m01
(1)

Equation (1) depicts the sigmoid function used for computing the probability value between 0 and 1.z represents given input to the sigmoid function. If the result of the sigmoid function is within 0.5, the given input will be assigned class 0 and if the probability output is between 0.5 and 1, class 1 is assigned.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 1 Issue (2022)
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing