Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification

Khan Md. Hasib, Nurul Akter Towhid, Md Rafiqul Islam

Source Title: International Journal of Cloud Applications and Computing (IJCAC) 11(4)

DOI: 10.4018/IJCAC.2021100101

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Imbalanced data presents many difficulties, as the majority of learners will be prejudice against the majority class, and in severe cases, may fully disregard the minority class. Over the last few decades, class inequality has been extensively researched using traditional machine learning techniques. However, there is relatively little analytical research in the field of deep learning with class inequality. In this article, the authors classify the imbalanced data with the combination of both sampling method and deep learning method. They propose a novel sampling-based deep learning method (HSDLM) to address the class imbalance problem. They preprocess the data with label encoding and remove the noisy data with the under-sampling technique edited nearest neighbor (ENN) algorithm. They also balance the data using the over-sampling technique SMOTE and apply parallelly three types of long short-term memory networks, which is a deep learning classifier. The experimental findings indicate that HSDLM is a promising and fruitful solution to working with strongly imbalanced datasets.

Article Preview

Top

1. Introduction

The phenomenon of data disparity or class imbalance denotes the condition where the number of samples from one class much dominates the number of another class. It is a particular case of classification issue where the class distribution between the classes is not standardized. There are primarily two groups in imbalanced data sets: the majority (negative) group and the minority group (positive). Compared to the minority party, the majority group has a very high level of data. While increasingly raw data is becoming easier to obtain, most of them have imbalanced distributions where a few classes of items are numerous, while others have only small representations. This is referred to as the “class imbalance” issue in the data mining world and is implicit in nearly all data sets collected (Chawla et al., 2004). For example, most people are safe in clinical diagnostic results, and only a comparatively small proportion of them unhealthy. Data sets are usually categorized as binary class data sets and multi-class data sets by class number in classification functions. Classification may also involve binaries and multi classes (Wu et al., 2015, Kaushik et al. 2019). This paper addresses all categories.

In classification tasks, data imbalance is the issue of unpredicted errors and even severe implications in data analysis. The majority class is the issue here for biasing the classification algorithms to force the skewed distribution of class instances. Proper resolution of the problem of data imbalance has been a critical need in data science. In multiple regions, data mismatch has had some significant implications. These issues include the prevention of fraud in the credit card sector (Clifton et al. 2004). Fraud transactions are very seldom carried out in proportion to the thousands of sales daily. In medical diagnosis, another serious expense of data imbalance is compensated for (Ginsburg et al. 2013). In the case with patient evidence that may not provide any of the symptoms with a more significant number of the population, identifying unusual conditions is much more complicated. Boeing assembly line in processing sectors (Riddle et al. 1991), the output of faulty product rates is inferior, and many processes are done based on supervised learning by automatic or semiautomated cells. Therefore these less wrong cases should be taken into account as they can lead to a catastrophic result for a single defective product. Again, imbalanced data creates a problem in data security issues in the cloud environment. For this purpose, a hierarchical identity-based cryptography mechanism is used to protect data (Kaushik et al. 2019). As such problems occur, a method that can analyze data imbalance and unravel the situation into a suitable solution must be built.

While the data imbalance has proven to be a significant issue, the standard classification algorithms do not answer this well. Many classifications are based on the premise that each class is equilibrated and uniformly distributed (Razzak et al. 2020). In various well-known classification algorithms, numerous attempts have been made to address this problem effectively. Sampling methods and cost-sensitive methods, for example, are widely used in SVM, neural networks, and other classifications to address the issue of class inequalities from a multidisciplinary perspective but are still not up to the mark. However, to the best of our knowledge, very little research has been performed on this matter in the area of deep learning. Many of the existing deep learning algorithms may not fix the problem of data imbalance. As a consequence, these algorithms can perform well on balanced data sets, while their success in unbalanced large data sets is not guaranteed (Wang et al., 2016, Usama et al. 2020). Still, the researcher finds various opportunities to work in that area using deep learning classifiers. To deal with the class imbalance problem here, we give the key objectives of our paper are as follows:

•
First, we introduced a new hybrid method using both undersampling and deep-learning methods to resolve the issue of class inequality.
•
Second, we preprocess the data with label encoding and removes the redundant data using the under-sampling algorithm edited nearest-neighbor from the main dataset. Then we apply SMOTE a widely used over-sampling technique to balance the data.
•
Third, we apply several deep learning classifiers such as bidirectional LSTM, stacked LSTM, and convolutional LSTM to classify them.
•
Finally, we took the average from all classifiers of the method and obtained our desired result using soft voting.

Complete Article List

Search this Journal:

Reset

Volume 14: 1 Issue (2024)

Volume 13: 1 Issue (2023)

Volume 12: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 11: 4 Issues (2021)

Volume 10: 4 Issues (2020)

Volume 9: 4 Issues (2019)

Volume 8: 4 Issues (2018)

Volume 7: 4 Issues (2017)

Volume 6: 4 Issues (2016)

Volume 5: 4 Issues (2015)

Volume 4: 4 Issues (2014)

Volume 3: 4 Issues (2013)

Volume 2: 4 Issues (2012)

Volume 1: 4 Issues (2011)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification

Abstract

1. Introduction

Complete Article List