A Novel Ensemble Learning Model Combined XGBoost With Deep Neural Network for Credit Scoring

A Novel Ensemble Learning Model Combined XGBoost With Deep Neural Network for Credit Scoring

Xiaowei He, Siqi Li, Xin Tian He, Wenqiang Wang, Xiang Zhang, Bin Wang
Copyright: © 2022 |Pages: 18
DOI: 10.4018/JITR.299924
Article PDF Download
Open access articles are freely available for download

Abstract

Credit scoring, aiming to distinguish potential loan defaulter, has played an important role in the financial industry. To further improve the accuracy and efficiency of classification, this paper develops an ensemble model combined extreme gradient boosting (XGBoost) and deep neural network (DNN). In the method, training set is divided into different subsets by bagging sampling at first. Then, each subset is trained as a feature extractor by DNN and the extracted features is taken as the input of XGBoost to construct the base classifier. At last, the prediction result is the average of outputs of different base classifiers. In the training verification process, three credit datasets from the UCI machine learning repository are used to evaluate the proposed model. The outcome shows that this model is superior with a significant improvement.
Article Preview
Top

Introduction

Credit risk has always been one of the most important issues faced by financial institutions (Lai, Yu, Wang, & Zhou, 2006; Lai, Yu, Zhou, & Wang, 2006; Yu, Wang, & Lai, 2008). With the change of the concept of mass consumption and the development of the financial industry, the credit business has developed rapidly, and the financial institutions are facing more and more severe challenges. In this process, Credit scoring plays an important role. It can model the potential risks of loan applicants and classify them into “good credit” or “bad credit”, which is a binary classification technology (He, Zhang, & Zhang, 2018; Xia, Liu, Li, & Liu, 2017). For banks, financial institutions or other Internet finance companies, the cost of misclassifying “bad credit applicants” as “good credit applicants” is much higher than that of misclassifying “good credit applicants” as “bad credit applicants” (Qian, Liang, Li, Feng, & Shi, 2014). Therefore, how to build a robust and reliable credit scoring model is getting wider attention from both academia and business circles.

There are two mainstream classification techniques for credit risk assessment, namely statistical analysis and machine learning (He et al., 2018; Saberi et al., 2013). In statistical analysis, Linear discriminant analysis (LDA) and logistic regression (LR) are the two most commonly used approaches (Eisenbeis, 1978; Henley & Edward, 1995). However, both LDA and LR have difficulty in modeling complex financial systems due to the use of ideal statistical assumptions. Machine learning techniques are also widely used in credit scoring, including k-nearest neighbor (KNN) (W. E. Henley & Hand, 1996), support vector machine (SVM) (Huang, Chen, Hsu, Chen, & Wu, 2004), decision tree (DT) (Xiu, Weiyun, Jianyong, Bing, & Wenhuang, 2004), mathematical programming (Peng, Kou, Shi, & Chen, 2008; SHI, PENG, XU, & TANG, 2002), and Multi-layer perceptron (MLP) with a single hidden layer (Alejo, García, Marqués, Sánchez, & Antonio-Velázquez, 2013). Apart from single classifiers, researches have also shown that ensemble classification tends to be an effective way in improving the accuracy and stability of a single classifier for credit scoring (Ko, Sabourin, & Britto, 2008; Tsymbal, Pechenizkiy, & Cunningham, 2005).

Ensemble learning is a method that integrating several classifiers derived from different algorithms, features and training subsets to predict the class label of unknown samples. Ensemble classification can take advantage of the diversity of classifiers to avoid the weaknesses of single one. Moreover, it has been shown theoretically and experimentally that classification based on ensemble learning performs better than a single classifier in terms of credit scoring (Nanni & Lumini, 2009; Xia et al., 2017; Xiao, Xiao, & Wang, 2016). In recent years, deep neural networks (DNN) has also been widely applied in classification problems. Such deep architecture improves the ability of feature extraction and help get more information of hidden layers, and that’s why its performance is better compared to shallow architectures in credit risk assessment. To the best of our knowledge, there were few studies on credit risk assessment by using DNN.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 15: 6 Issues (2022): 1 Released, 5 Forthcoming
Volume 14: 4 Issues (2021)
Volume 13: 4 Issues (2020)
Volume 12: 4 Issues (2019)
Volume 11: 4 Issues (2018)
Volume 10: 4 Issues (2017)
Volume 9: 4 Issues (2016)
Volume 8: 4 Issues (2015)
Volume 7: 4 Issues (2014)
Volume 6: 4 Issues (2013)
Volume 5: 4 Issues (2012)
Volume 4: 4 Issues (2011)
Volume 3: 4 Issues (2010)
Volume 2: 4 Issues (2009)
Volume 1: 4 Issues (2008)
View Complete Journal Contents Listing