Article Preview
TopIntroduction
Credit risk has always been one of the most important issues faced by financial institutions (Lai, Yu, Wang, & Zhou, 2006; Lai, Yu, Zhou, & Wang, 2006; Yu, Wang, & Lai, 2008). With the change of the concept of mass consumption and the development of the financial industry, the credit business has developed rapidly, and the financial institutions are facing more and more severe challenges. In this process, Credit scoring plays an important role. It can model the potential risks of loan applicants and classify them into “good credit” or “bad credit”, which is a binary classification technology (He, Zhang, & Zhang, 2018; Xia, Liu, Li, & Liu, 2017). For banks, financial institutions or other Internet finance companies, the cost of misclassifying “bad credit applicants” as “good credit applicants” is much higher than that of misclassifying “good credit applicants” as “bad credit applicants” (Qian, Liang, Li, Feng, & Shi, 2014). Therefore, how to build a robust and reliable credit scoring model is getting wider attention from both academia and business circles.
There are two mainstream classification techniques for credit risk assessment, namely statistical analysis and machine learning (He et al., 2018; Saberi et al., 2013). In statistical analysis, Linear discriminant analysis (LDA) and logistic regression (LR) are the two most commonly used approaches (Eisenbeis, 1978; Henley & Edward, 1995). However, both LDA and LR have difficulty in modeling complex financial systems due to the use of ideal statistical assumptions. Machine learning techniques are also widely used in credit scoring, including k-nearest neighbor (KNN) (W. E. Henley & Hand, 1996), support vector machine (SVM) (Huang, Chen, Hsu, Chen, & Wu, 2004), decision tree (DT) (Xiu, Weiyun, Jianyong, Bing, & Wenhuang, 2004), mathematical programming (Peng, Kou, Shi, & Chen, 2008; SHI, PENG, XU, & TANG, 2002), and Multi-layer perceptron (MLP) with a single hidden layer (Alejo, García, Marqués, Sánchez, & Antonio-Velázquez, 2013). Apart from single classifiers, researches have also shown that ensemble classification tends to be an effective way in improving the accuracy and stability of a single classifier for credit scoring (Ko, Sabourin, & Britto, 2008; Tsymbal, Pechenizkiy, & Cunningham, 2005).
Ensemble learning is a method that integrating several classifiers derived from different algorithms, features and training subsets to predict the class label of unknown samples. Ensemble classification can take advantage of the diversity of classifiers to avoid the weaknesses of single one. Moreover, it has been shown theoretically and experimentally that classification based on ensemble learning performs better than a single classifier in terms of credit scoring (Nanni & Lumini, 2009; Xia et al., 2017; Xiao, Xiao, & Wang, 2016). In recent years, deep neural networks (DNN) has also been widely applied in classification problems. Such deep architecture improves the ability of feature extraction and help get more information of hidden layers, and that’s why its performance is better compared to shallow architectures in credit risk assessment. To the best of our knowledge, there were few studies on credit risk assessment by using DNN.