Article Preview
Top1. Introduction
Heart disease is a set of illnesses that touch the circulatory system, which comprises the heart and blood vessels. Cardiology tries to deal with the most common condition called “Heart Attack” and the factors leading to it. Cardiovascular Disease cause a wide number of pathologies that affect the heart and blood vessels causing severe disability and death. The narrowing of the coronary arteries is known as CAD, it results the reduction of blood and oxygen supply to the heart, which can lead to Myocardial infarctions (Heart Attack) (Medline, 2013).
Globally, it is known that CAD is the first leading cause of death especially in countries under development where the advanced technologies in the medical field are not very effective to deal with such disease. For example, it is estimated that 45% of deaths in Algeria are due to CAD (Kheireddine, 2012). The risk of cardiovascular disease is related to various factors such as environmental, psychological, genetic, demographic variables and health services. Totally, CAD need surgical treatment, which is very costly and not affordable for normal population. In addition, the diagnosis of CAD is challenging for the physicians, particularly when there is no symptom. Much information from patients is needed. Therefore, the necessity of prevention systems to predict risk factors for these diseases in order to take preventive measures is vital.
The increasing complexity in recent years, much information in the medical field is stored in databases. These data are used mostly for management and analysis of the patient population. They are frequently used for research, evaluation, planning and other purposes by various users in terms of analysis and predicting of the health status of patients.
Medical diagnosis systems are mainly based on Machine Learning (ML) algorithms. Consequently, these systems are then trained to learn decision characteristics of a physician for an explicit disease and then they can be used to support physician decision making to diagnose future patients of the same disease (Abbasi, 2006; Volkmar, 2000). Unsuitably, there is no mutual model that can be adjusted for the diagnosis of all kinds of diseases (Miller, 1994). In some cases, we can find some datasets described by a large number of features that can surpass the number of data themselves. This problematic known as “the curse of dimensionality” is challenging for numerous ML applications in decision support systems. It can increase the risk of considering uncorrelated or redundant features and can lead to lower classification accuracy (Gansterer, 2008; Polat, 2006 ; Xie & Wang, 2011).
Hence, the selection and elimination process of irrelevant features is an important step for designing effective decision support systems. For that reason, the main objective of this paper is to offer a FS approach to reduce the number of features for CAD dataset and to gain higher accuracy classification rates. This approach consists of a two-step process: In the first step, we produce a subset of features, the dimension of database is reduced using the GA algorithm running in parallel with BN and then the new subset feature model is obtained, a BN classifier is used to measure feature model accuracy. For the validation, we make use of 10 fold cross validation strategy and formerly the proposed approach is compared with four additional ML algorithms: SVM, MLP and C4.5. Additionally, the designed algorithm is compared with other FS Algorithms.
The rest of the paper is planned as follows. The next section introduces a literature survey for CAD problem. Section 3 describes CAD and it is followed by a global introduction to FS strategies. In Section 4, The ML methods used in our study are presented for evaluating the accuracy of the feature model obtained by GA wrapper BN algorithm and the proposed approach is then described and presented. In Section 5, experimental results are discussed and the conclusion is presented in Sections 6.