Article Preview
Top1. Introduction
Fraud is intentional deception to obtain financial gain or cause loss by implicit or explicit tricks (Kou et al., 2019). Fraud violates public laws, in which the swindlers attempt to obtain illegal benefits or produce irreversible losses (Carcillo et al., 2018; Khanuja & Adane, 2018). The damage resulting from fraudulent activities shows that they cost the victims and financial institutions a significant amount of money. According to the statistics from the Internet Crime Complaint Center, there has been a substantial soar in reported fraud activities in the last decade (Hou et al., 2020).
Industries and research institutions have invested heavily to develop effective methods to combat the problem with emerging machine learning, deep learning, big data, and computational intelligence technologies (Cai & Zhang, 2020; Chua & Storey, 2016; Oreski & Oreski, 2014). Their efforts in this perspective have resulted in many approaches that can intelligently differentiate legitimate transactions from fraudulent ones. However, no matter what methods are applied, some common problems still exist and often reduce their performance and efficiency. For instance, one of the most common problems resides in the training data of the past transactions represented by unbalanced distribution, which causes various difficulties of overfitting and results in inferior performances of the implemented classifiers (Altinbas, 2020). These problems occur due to the relatively smaller number of available fraudulent samples than legitimate ones. This type of unbalance prevents the designation of a dependable model of assessment (Khemakhem, Said, & Boujelbene, 2018). Moreover, data heterogeneity and overlap are additional issues that aggravate the problem (Arora & Kaur, 2019). Computational complexity is another challenge for effectively identifying anomalies (Coser, Maer-Matei, & Albu, 2019; Xu et al., 2020; Ye et al., 2018). These problems significantly impact the efficacy of any fraud recognition techniques that may produce a large number of incorrect classifications.
In recent years, most studies on credit risk assessment models for financial institutions have focused on improving imbalanced data or enhancing classification accuracy through multistage modeling and deep learning. Although these methods can somewhat boost accuracy, the following research gaps still exist. First, low time responsiveness dominates as models with higher classification accuracy tend to have higher model complexity. Second, transparency and interpretability are lacking for the existing methods, along with the insufficient analysis of behavior features (Laughlin, Sankaranarayanan, & El-Khatib, 2020). Therefore, to address the research gaps with the motivation of improving high efficiency and interpretability, we study the research questions in this paper as follows:
- (1)
How to build an efficient and interpretable fraud detection model based on the characteristics of the financial domain?
- (2)
How to obtain knowledge about the risks associated with credit assessment? And what are the implications for financial institutions?