Different Approaches to Reducing Bias in Classification of Medical Data by Ensemble Learning Methods

Different Approaches to Reducing Bias in Classification of Medical Data by Ensemble Learning Methods

Adem Doganer
Copyright: © 2021 |Pages: 16
DOI: 10.4018/IJBDAH.20210701.oa2
Article PDF Download
Open access articles are freely available for download

Abstract

In this study, different models were created to reduce bias by ensemble learning methods. Reducing the bias error will improve the classification performance. In order to increase the classification performance, the most appropriate ensemble learning method and ideal sample size were investigated. Bias values and learning performances of different ensemble learning methods were compared. AdaBoost ensemble learning method provided the lowest bias value with n: 250 sample size while Stacking ensemble learning method provided the lowest bias value with n: 500, n: 750, n: 1000, n: 2000, n: 4000, n: 6000, n: 8000, n: 10000, and n: 20000 sample sizes. When the learning performances were compared, AdaBoost ensemble learning method and RBF classifier achieved the best performance with n: 250 sample size (ACC = 0.956, AUC: 0.987). The AdaBoost ensemble learning method and REPTree classifier achieved the best performance with n: 20000 sample size (ACC = 0.990, AUC = 0.999). In conclusion, for reduction of bias, methods based on stacking displayed a higher performance compared to other methods.
Article Preview
Top

Introduction

Machine learning methods have been widely used in the field of data mining in recent years. Machine learning algorithms based on the theoretical structure of statistics and computer science can provide high performance in data extraction, estimation and classification. With the development of technology in the field of health, there has been a rapid increase in data. Traditional statistical methods have been insufficient in terms of performance regarding data extraction and classification. Machine learning methods have been a powerful alternative to traditional methods because they both save time and provide high performance. Machine learning methods form the basis of many artificial intelligence applications. These methods are used in many medical fields such as diagnosis, early diagnosis and pattern recognition.

Although machine learning methods are widely used and can be easily learned from data, there are cases where they cannot provide high classification performance in all conditions. In some cases, although there is a high performance learning from the training data set, a poor performance is given regarding test data sets. There are different reasons for this issue. Although the model provides high accuracy performance with the training data set, the main reason for the low accuracy performance with the test data set is the problem of overfitting. The problem of overfitting is an error caused by the model's memorizing the data instead of learning the pattern in the training data set. The model that memorizes the data during the training phase provides a high accuracy performance but gives a low accuracy performance with different data due to failure to learn the pattern in the testing phase. This error, which causes the problem of overfitting, is described as variance in machine learning. Variance is not the only error in machine learning. In the training phase, the model does not provide high accuracy performance with the training data set. A model that shows low accuracy performance during the training phase will also demonstrate low accuracy performance during the testing phase. The failure of the model to achieve the desired accuracy performance in the training data set is described as the problem of underfitting. The problem of underfitting occurs when the bias error is high. Bias is an error caused by the inability of the model to learn the pattern in the training data set. There are different reasons for the occurrence of bias. One of the reasons is that the correct model has not been selected for the training data set. Some models are not sufficient to classify the data set and learn the pattern. These models are weak classifiers. Therefore, strong classifiers are used to reduce bias. However, while strong classifiers provide high performance with training data sets, they might show a lower performance with test data sets. This issue induces the problem of overfitting. Another method to reduce bias is to increase model complexity. Increasing model complexity is important for reducing bias in the training data set. However, since increasing model complexity also increases variance, it results in a poor performance with the test data set.

Complete Article List

Search this Journal:
Reset
Volume 9: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 8: 1 Issue (2023)
Volume 7: 1 Issue (2022)
Volume 6: 2 Issues (2021)
Volume 5: 2 Issues (2020)
Volume 4: 2 Issues (2019)
Volume 3: 2 Issues (2018)
Volume 2: 2 Issues (2017)
Volume 1: 1 Issue (2016)
View Complete Journal Contents Listing