Spectral Clustering and Cost-Sensitive Deep Neural Network-Based Undersampling Approach for P2P Lending Data

Spectral Clustering and Cost-Sensitive Deep Neural Network-Based Undersampling Approach for P2P Lending Data

Pankaj Kumar Jadwal, Sonal Jain, Basant Agarwal
DOI: 10.4018/IJITWE.2020100103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Peer-to-peer lending, also known as P2P lending, is the new generation loan disbursement process, where lenders and borrowers communicate through online services. Loans through P2P lending platforms are generally unsecured, due to the presence of borrowers with low credit scores. Lendingclub dataset, consisting of quantitative and qualitative information of borrowers from 2007 to 2011, is taken for the research. Machine learning models trained with such imbalanced dataset consists of biasing towards major class samples. The model performs significantly well on major class (safe borrowers) in terms of high precision but does not perform significantly well on minor class (defaulted) borrowers and provides low recall on minor class samples. To deal with the issue, a novel undersampling algorithm based on the combination of spectral clustering and cost sensitive deep neural network (SCCSDNN) is proposed. Experimental results showcased the outstanding performance of the proposed technique, and it outperforms state of the art undersampling, oversampling and ensemble resampling techniques.
Article Preview
Top

1. Introduction

Peer to peer lending, also known as crowd lending, where lenders and borrowers can communicate directly without the involvement of financial bodies as the middleman (Ma and Wang 2016). Social lending market is spreading exponentially due to fast loan disbursement and less paperwork compared to traditional loan system (Guo et al., 2016). Borrowers with good credit can take the loan on lower interest rates, and lenders can also make more profits (Malekipirbazari and Aksakalli, 2015). Borrowers with average or lower credit score can communicate with lenders who are ready to provide them loans on high-interest rates (Serrano-Cinca and Gutiérrez-Nieto 2016).

There are several security concerns associated with P2P (Peer-to-Peer) lending market. Due to the absence of intermediate firm who is checking the authenticity of borrowers and lenders, it can be dangerous for lenders as well as for borrowers (Malekipirbazari and Aksakalli, 2015). Loan dispersed through P2P lending platforms contains risk of default and repayment delay. Borrowers and lenders communicates on common platform at various risk levels. High risk provides better returns but generates probable conditions of being defaulted. Therefore, precise prediction of the credibility of the borrowers is a crucial and significant issue in social lending. Statistical and machine learning models can be utilized to predict the probable defaulters. Due to capability of generation of more optimal results, Machine learning models have already been outperformed statistical models. More accurate and specialized machine learning models are required to deal with such issues in P2P lending.

Several issues like long process time, limited lending money, not legal everywhere are associated with the P2P lending market. In the conventional process of loan disbursement, banks communicate directly with borrowers and assess the credibility of the borrowers, whereas, in P2P lending market, lenders and borrowers communicate directly with each other on the social lending platform. P2P lending datasets contains imbalance implicitly. Ratio of safe borrowers to defaulters is very high due to imbalance in the dataset. Machine learning models trained through such dataset provides good accuracy but accuracy is biased towards major class samples (Safe borrowers) and model predicts wrong class to most of the minor class samples. Therefore, traditional risk assessment machine learning models will not provide significant results to predict potential defaulters from borrowers.

In Imbalanced datasets, the ratio of the number of samples in major class to minor class, is significantly high. Machine learning models build on such datasets are biased towards majority class. Therefore, the precision of the major class is on the higher side and recall of minor class is on the lower side. Such kind of models will not be able to predict defaulters optimally (Lin et al., 2017; Xia & Liu 2017; Yijing et al. 2016). Although, Machine learning models trained through imbalanced dataset provides promising predictive accuracy, but due to less training of minor class samples compared to major class samples, ratio of predictive accuracy of minor class samples to major class samples is not close to one.

Therefore, to build machine learning models capable of providing high precision on major class samples and high recall on minor class samples, a novel undersampling approach SCCSDNN (Spectral Clustering and Cost Sensitive Deep Neural Network based Undersampling) is proposed. SCCSDNN is the combination of spectral clustering with cost sensitive deep neural network. Spectral clustering is applied to the major class samples and K clusters are obtained. Afterwards, K clusters are concatenated with minor class samples and K different datasets are obtained. Later, cost sensitive deep neural network model is built through k datasets, and dataset with the highest precision on major class and highest recall on minor class is chosen as the undersampled dataset.

The remaining of the paper is structured as follows. In Section 2, credit risk in social lending along with several resampling algorithms proposed to deal with the imbalance are discussed. In section 3, various quantitative and qualitative attributes of the lendingclub dataset are discussed. In section 4, the proposed undersampling algorithm (SCCSDNN) is discussed. In section 5, experimental results along with evaluation parameters are explored. The final section of the paper concludes the paper along the future directions.

Complete Article List

Search this Journal:
Reset
Volume 19: 1 Issue (2024)
Volume 18: 1 Issue (2023)
Volume 17: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 16: 4 Issues (2021)
Volume 15: 4 Issues (2020)
Volume 14: 4 Issues (2019)
Volume 13: 4 Issues (2018)
Volume 12: 4 Issues (2017)
Volume 11: 4 Issues (2016)
Volume 10: 4 Issues (2015)
Volume 9: 4 Issues (2014)
Volume 8: 4 Issues (2013)
Volume 7: 4 Issues (2012)
Volume 6: 4 Issues (2011)
Volume 5: 4 Issues (2010)
Volume 4: 4 Issues (2009)
Volume 3: 4 Issues (2008)
Volume 2: 4 Issues (2007)
Volume 1: 4 Issues (2006)
View Complete Journal Contents Listing