Predictive Analysis of Diabetes Using Machine Learning Algorithms

Predictive Analysis of Diabetes Using Machine Learning Algorithms

DOI: 10.4018/978-1-6684-4580-8.ch018
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Diabetes is a very harmful disease that causes high blood sugar levels and occurs when the blood glucose level is high. Diabetes causes numerous diseases in humans: congestive heart failure, stroke, kidney and eye problems, dental issues, nerve damage, and foot problems. With the recent development in the machine learning concept, it is easy to analyze and predict whether a person is diabetic or not. This research mainly focuses on using several prediction algorithms of machine learning. The algorithms used in this research are k-nearest neighbor, logistic regression, SVM (support vector machine), Gaussian naive Bayes, decision tree, multilayer perceptron, random forest, XGBoost, and AdaBoost. Among these algorithms, the XGBoost performed better than the other algorithms achieving an accuracy of 90%, and the f1 score and Jaccard score were 91% and 86%, respectively. The primary goal of this research is to apply numerous machine learning algorithms to diabetic datasets, analyze their results, and select the best one that performs well.
Chapter Preview
Top

Introduction

Diabetes is a chronic disease that happens when the quantity of glucose is immoderate. The main source of energy is blood glucose which is gained from the food we consume. Lack of insulin which is a hormone produced by the pancreas for the breakdown of glucose causes the rise of glucose in the body. There are numerous causes of diabetes, for example, genetic mutations, hormonal disease, consumption of medicines, pregnancy, age, BMI, overweight, alteration of the pancreas, and other diseases which can harm the pancreas. The most common types of diabetes are type 1, type 2, and gestational diabetes and each of them has their own symptoms and causes. The major symptoms include an increase in thirst, urination, hunger, blurred vision, weight loss, and sores that do not heal. Diabetes can cause heart attack, stroke, coma, heart failure, and coma. These issues can lead to human death. The statistics of diabetes for the year 2021, show that nearly 537 million adults between the age range of 20 to 79 years are suffering from this disease. These trends show that the total number of people suffering from this disease is likely to rise from 643 million in 2030 to 783 million in 2045. Machine learning (ML) is a subcategory of AI, where algorithms process huge amounts of datasets to discover patterns and perform tasks autonomously without being guided on exactly how to address the challenges. In the last few years, the extensive availability of vigorous hardware and cloud computing has resulted in broader adoption of ML in different parts of human lives, for instance, social networking, commute estimation, email intelligence, personal smart assistants, banking, personal finance, medical diagnosis, and healthcare. Machine learning plays a vital role in healthcare, for example accurately detecting disease at an earlier stage, discovering and developing new drugs, helps to analyze huge amounts of data in a short time and suggest outcomes, medical image diagnosis, outbreak prediction, smart health records, etc. Machine learning and AI-based technologies are being used in monitoring and predicting epidemics around the globe. With the constant rise of data that is being generated by modern technology in the healthcare sector, it is important to use machine learning, which can help to use this data and predict or diagnose a disease at an earlier stage. This paper depicts the use of different machine learning algorithms on the “Pima Indians diabetes dataset” to predict the diabetes disease based on symptoms for the patient. The dependent variable is the outcome to show whether a particular patient has diabetes or not. In this comparative analysis we have used various machine learning algorithms and applied them to the Pima Indians diabetes dataset, to see which model performs well, and accurately predict if patients in the dataset have diabetes or not. The motive of this work is to compare the performance of machine learning algorithms, which are used to predict diabetes disease. Algorithms used for the dataset to predict diabetes are, k-nearest neighbour, logistic regression, SVM (Support Vector Machine), Gaussian naive Bayes, Decision tree, Multilayer Perceptron, Random Forest, XGBoost, AdaBoost. According to the obtained results, it is discovered that XGBoost performs better than other algorithms, by achieving an accuracy score of 91% (Althar, R. R et al., 2022).

The rest of the paper is arranged as follows: the second section delves literature review, the third section illustrates the proposed methodology, section four includes the flowcharts of the model, section five discusses the algorithms and major findings, and finally section six concludes the research work.

Complete Chapter List

Search this Book:
Reset