Effective Bankruptcy Prediction Models for North American Companies

Effective Bankruptcy Prediction Models for North American Companies

Rachel Cardarelli, Son Nguyen, Rick Gorvett, John Quinn
Copyright: © 2023 |Pages: 15
DOI: 10.4018/978-1-7998-9220-5.ch108
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Bankruptcy prediction is a widely researched topic. However, few studies focus on dealing with the imbalance problem. This article proposes a new technique that applies a bagging undersampling procedure to balance the data and compares it to random undersampling and five oversampling techniques. The performance of the algorithm is evaluated by a random forest's balanced accuracy, sensitivity, and specificity. The results show that models trained after applying the oversampling techniques are prone to overfitting, and the model trained after applying the proposed method had the highest balanced accuracy without overfitting.
Chapter Preview
Top

Background

In this review of bankruptcy prediction literature, there is a discussion of commonly used predictive models, feature selection techniques, model tuning, ensemble learning, and other distinctive topics studied in the literature. The review concludes with an overview of methods used to handle the imbalance problem.

Key Terms in this Chapter

Undersampling: The process of randomly eliminating observations in the majority class until the majority class has the same observations as the minority class.

Classification Random Forest: A classification models using the majority votes of multiple classification trees to predict. A split of a tree in a random forest is determined using only by considering a pre-selected number of variables in the dataset.

Balanced Accuracy: The average of the true positive rate and the true negative rate of a predictive model. Balanced accuracy is often used as a measure of a predictive model trained on an imbalanced dataset.

Imbalanced Classification: The problem of classifying a dataset into different categories where the distribution of sizes of the categories is not uniform, i.e., one category may have too many observations while other categories may have too few observations.

Oversampling: The process of randomly adding duplicate observations in the minority class until the minority class has the same observations as the majority class.

Complete Chapter List

Search this Book:
Reset