A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem

A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem

Debashree Devi, Suyel Namasudra, Seifedine Kadry
Copyright: © 2020 |Pages: 27
DOI: 10.4018/IJDWM.2020070104
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The subject of a class imbalance is a well-investigated topic which addresses performance degradation of standard learning models due to uneven distribution of classes in a dataspace. Cluster-based undersampling is a popular solution in the domain which offers to eliminate majority class instances from a definite number of clusters to balance the training data. However, distance-based elimination of instances often got affected by the underlying data distribution. Recently, ensemble learning techniques have emerged as effective solution due to its weighted learning principle of rare instances. In this article, a boosting aided adaptive cluster-based undersampling technique is proposed to facilitate elimination of learning- insignificant majority class instances from the clusters, detected through AdaBoost ensemble learning model. The proposed work is validated with seven existing cluster based undersampling techniques for six binary datasets and three classification models. The experimental results have established the effectives of the proposed technique than the existing methods.
Article Preview
Top

1. Introduction

The issue of class imbalance occurs in case of real-life datasets with uneven data scattering, formally known as imbalanced data. The phenomenon of data points having class labels, where one of the class instances are over-represented by the rest of the instances, is termed as the class imbalance problem (Das et al., 2013). In practice, the class having lower coverage appear to be the significant one associated with a higher misclassification cost (Lopez et al., 2013; Elkan 2001). Conventional learning models, when implemented in the domain of imbalance data space, tend to be biased towards the over-represented class, which in turn degrades the performance of the learning model, and increases the misclassification of minority class instances. The situation can charge higher cost when it is crucial to correctly classify the minority class instances (Oh 2011; Kang & Cho 2006).

The solutions proposed to solve the class imbalance problem can be categorized into two major groups: (a) Data level solution formally known as Data Sampling which provides to modify data distribution and to yield a revised set with balanced data distribution, and (b) Algorithmic level solution which modifies the classifier in order to improve the classifier accuracy (Das et al., 2013; Lopez et al., 2013; Phua et al., 2004). Data level solutions can be either undersampling (eliminating majority class instances) or oversampling (adding duplicate minority class instances). Each of these solutions has their own significant drawbacks. Examples of data-level solutions include condensed nearest neighbor (CNN) (Phua et al., 2004), one-sided-selection (OSS) (Turney 2000), Tomek-link (Hart 1968), cluster-based-undersampling (Tomek 1976), inverse random undersampling (Kubat & Matwin 1997), Synthetic Minority Oversampling (SMOTE) (García & Herrera 2009), Borderline-SMOTE (Agrawal et al., 2015), ADASYN (Guo et al., 2008), Safe-SMOTE (Ling & Li 1998), etc. Examples of algorithmic level solutions include Cost-Sensitive Learning technique (CSL) (Nguyen et al., 2010) Improved weighted-Extreme Learning Machine (IW-ELM) (Lu et al., 2019), RUSBoost (Seiffert et al., 2010), SMOTEBoost (Rahman & Davis 2013) etc.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing