A Comprehensive Survey of Data Mining Techniques in Disease Prediction

A Comprehensive Survey of Data Mining Techniques in Disease Prediction

Durgadevi Mullaivanan, Kalpana R.
DOI: 10.4018/978-1-7998-2566-1.ch002
(Individual Chapters)
No Current Special Offers


In recent days, data mining has become very popular, and numerous research works have been carried out of using data mining techniques in the healthcare sector. The healthcare transactions generate a massive amount of data which are very voluminous and complex to be processed. Therefore, data mining techniques have been employed, which provides a practical methodology for transforming the massive amount of data into efficient knowledge for the process of decision making. Prediction and classification are the two forms of data analysis methods. However, it is still difficult to explore the complete literature in the healthcare domain. This chapter reviews the research overview that is done in the healthcare sector utilizing different data mining methodologies for prediction and classification of diverse diseases. Also, a detailed comparison of reviewed methods takes place for better understanding of the existing models. An extensive experimental study is also performed to analyze the performance of data mining algorithms.
Chapter Preview

1. Introduction

Data mining is a procedure of discovering knowledge from massive databases to uncover trends and patterns exist in data. The vast amount of data present in the information industry has to be converted to extract useful information from it (Neesha Jothi, et.al, 2015). Data mining also perform several other processes like cleaning, integration, transformation, mining, evaluation, and presentation. The different stages included in data mining are demonstrated in Fig. 1. There are two categories exist in the data mining patterns such as Descriptive, Classification and Prediction (G. E. Vlahos, et.al, 2004). The descriptive function deals with the general properties of data whereas the classification and prediction deal with the concepts of data. The descriptive function involves summarization and mapping of data which is commonly known as data characterization and data discrimination. Predictive analytics is a combination of historical data, machine learning, and artificial intelligence. Predictive analytics helps to analyze the state of the current data to determine the future outcome. It becomes much more prevalent in areas like finance, marketing, healthcare, social networking, and other areas. The implementation of predictive analytics is a complex process as it comes with many challenges. The ultimate aim of predictive analytics in the digital world is to get better revenue and profit at reduced cost and risk. The problem of every prediction must come with a “remedy.” Predictive analytics can be applied to any type of mysterious data in the past, present or future (Chandamona and Ponperisasmy, 2016).

In the healthcare sector, data mining becomes more familiar. Numerous factors have been inspired by the usage of data mining in healthcare (Salim A. Dewani and Zaipuna O. Yonah, 2017). The subsistence of medical insurance abuse and fraud makes numerous healthcare insurers to utilize data mining approaches to decrease their losses by identifying and tracking offenders. Fraud identification utilizes the applications of data mining in the profitable globe, for instance, the identification of false transactions in credit card. The massive sum of data generated through the healthcare sector is very complicated and huge to be processed and investigated through the conventional approaches.

Figure 1.

Data mining stages


Data mining enhances the procedure of decision-making through exploring patterns and tendency in a massive quantity of composite data. The investigation has become very important since economical pressure has enhanced the requirement of healthcare sectors to create decisions using the investigation of medical as well as financial data. The healthcare organizations which make use of data mining techniques that are highly positioned to meet its long-term requirements (I. Witten, et.al, 2011). Number of applications related to healthcare is existed by the use of data mining techniques. In general, they can be integrated as the examination of treatment efficiency, healthcare management, management of customer relationship, and discovery of abuse and fraud. Applications of data mining are employed for validating the efficiency of medical treatments. Using the symptoms, causes, and treatments courses, data mining techniques offer an investigation of the patient status. Numerous dedicated medical data mining like predictive analysis and medicine of DNA micro-arrays are also developed. The primary use of predictive analytics in medical decision aiding systems is to predict the percentage of risk of developing certain diseases like diabetes, heart disease or other complications.

In the fast growing world, a number of disease existences tend to increase drastically and data mining applications find useful in assisting it. Since numerous disease prediction models exists, it is difficult to explore the precious review in the area of health care. This chapter makes a review of the present research overview that is performed in the healthcare sector utilizing different data mining methodologies for prediction and classification of diverse diseases. In addition, a detailed comparison of reviewed methods also takes place for better understanding the existing models. An experimental study is also done to analyze the performance of data mining algorithms.

The rest of the chapter is arranged as below — section 2 provides the overview of data mining process. Section 3 reviews the existing data mining techniques in an elaborate way. Section 4 performs the validation of the reviewed methods and Section 5 concludes the extensive survey.

Complete Chapter List

Search this Book: