A Comprehensive Approach for Using Hybrid Ensemble Methods for Diabetes Detection

A Comprehensive Approach for Using Hybrid Ensemble Methods for Diabetes Detection

Md Sakir Ahmed, Abhijit Bora
Copyright: © 2024 |Pages: 15
DOI: 10.4018/979-8-3693-2260-4.ch001
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This study is focused on the possible application of hybrid models as well as their usage in the detection of diabetes. This study focuses on various machine learning algorithms like Decision Trees, Random Forests, Logistic Regression, K-nearest neighbor, Support Vector Machines, Gaussian Naive Bayes, Adaptive Boosting Classifier, and Extreme Gradient Boosting as well as the usage of Stacking Classifier for the preparation of the hybrid model. An in-depth analysis was also made during this study to compare the traditional approach with the hybrid approach. Moreover, the usage of data augmentation and its application during an analysis has also been discussed along with the application of hyperparameter tuning and cross-validation during training of the various models.
Chapter Preview
Top

1. Introduction

1.1 Background of the problem

With the increase in the consumption of processed foods, there has also been an increase in the number of cases of cases of diabetes. There has been a linear increase in the number of cases, affecting a vast spectrum of the global population ranging from children to adults to seasoned citizens. This sudden increase can be directly correlated to increased consumption of processed foods as per studies, however, this is not the only factor. Lack of physical activity, consumption of alcohol, smoking, and improper sleep schedules are some of the contributing factors to the rise in the number of cases. Traditional approaches for diabetes detection include urine tests, random blood sugar tests, clinical symptoms, risk assessments, etc. However, with the advent of technology emphasis needs to be given to finding newer methods to identify and assess the various risk factors as well as for early detection of diabetes. This may in turn reduce the number of cases occurring annually enabling a healthier life for the global population.

1.2 Proposed solution

This study is a brief introduction to hybrid ensemble learning models and focuses on giving a detailed overview of the possible implications of these models on early diagnosis as well as their usage for the identification of risk factors. Several machine learning algorithms like Decision Trees, Random Forest, Logistic Regression, K-nearest neighbors, Support Vector Machines, Gaussian Naive Bayes, and Adaptive Boosting Classifier can be used to diagnose diabetes as well as other diseases quite accurately. The accuracy can be further increased by stacking multiple models using a stacking classifier. These stacked models are known as hybrid models and provide better accuracy compared to just using a single classifier due to their robustness in identifying noise in the data, better hyperparameter tuning, enhanced adaptability, and improved generalization.

In this study, several traditional machine-learning algorithms were used along with hyperparameter tuning to find the best suitable parameter for the given data after which the best-performing models were stacked together along with Extreme Gradient Boosting Classifier, and their accuracy for training and testing was calculated. The observations are discussed in detail in Sections 3 and 4.

Top

The HoeffdingTree algorithm was used (Mercaldo et al., 2017) to detect diabetes and showed 77% accuracy and 77.5% recall. Various machine (Al-Zebari & Sengur, 2019) learning algorithms were implemented. It was found that Logistic Regression yielded the best result with an accuracy of 77.9%, and the Coarse Gaussian SVM technique yielded an accuracy of 65.5% which was the lowest among all the algorithms. Again several machine learning models were implemented (Islam et al., 2020) and it was found that the random forest classifier gave a very high accuracy of 99.35% for the prediction of diabetes. It was also observed that (Islam and Khanam, 2021) the Gaussian Naive Bayes classifier yielded an accuracy of 79.87% for the prediction of diabetes. A web app was also developed (Pankaj et al., 2021) for the diagnosis of diabetes that uses a questionnaire rather than a medical test and utilizes machine learning algorithms to predict if a person has diabetes. Similarly, various other machine learning algorithms were implemented (Farajollahi et al., 2021) and it was found that the Adaptive Boosting Classifier yielded the highest accuracy of 81%. In another study (Mangal and Jain, 2022) it was found that Random Forest yielded an accuracy of 99% for the detection of diabetes. It was also observed (Liu et al., 2022) that the Extreme Gradient Boosting classifier yielded the best accuracy of 75% among several other algorithms and proposed that it can be used for screening individuals at high risk of Type 2 diabetes at an early stage. It was also proposed (Charitha et al., 2022) that machine learning algorithms can be used to predict Type 2 diabetes, and observed that the Light Gradient Boosting Machine yielded the highest accuracy of 91.47% . Again in another study (Bhat et al., 2022) it was found that random forest gives an accuracy of 97.75% in the detection of Diabetes Mellitus. In another study (Gowthami et al., 2023), various algorithms were implemented, namely., Logistic Regression, K-Nearest Neighbors, Decision Trees, Random Forest, and Support Vector Machines found that the Random Forest Classifier yielded the highest accuracy with an accuracy of 98% for Type 2 Diabetes Mellitus. KNN was also applied (Rathi and Madeira., 2023) to detect and test its implication regarding Diabetes Mellitus.

Complete Chapter List

Search this Book:
Reset