Credit Risk Models for Financial Fraud Detection: A New Outlier Feature Analysis Method of XGBoost With SMOTE

Huosong Xia, Wuyue An, Zuopeng (Justin) Zhang

Source Title: Journal of Database Management (JDM) 34(1)

DOI: 10.4018/JDM.321739

Article PDF Download Open access articles are freely available for download

Abstract

Outlier detection is currently applied in many fields, where existing research focuses on improving imbalanced data or enhancing classification accuracy. In the financial area, financial fraud detection puts higher demands on real-time and interpretability. This paper attempts to develop a credit risk model for financial fraud detection based on an extreme gradient boosting tree (XGBoost). SMOTE is adopted to deal with imbalanced data. AUC is the assessment indicator, and the running time is taken as the reference to compare with other frequently used classification algorithms. The results indicate that the method proposed by this paper performs better than others. At the same time, XGBoost can obtain a ranking of important features that impact the classification results when performing classification tasks, making the evaluation results of the model interpretable. The above shows that the model proposed in the paper is more practical in solving credit risk assessment problems. It has faster response times, reduced costs, and better interpretability.

Article Preview

Top

1. Introduction

Fraud is intentional deception to obtain financial gain or cause loss by implicit or explicit tricks (Kou et al., 2019). Fraud violates public laws, in which the swindlers attempt to obtain illegal benefits or produce irreversible losses (Carcillo et al., 2018; Khanuja & Adane, 2018). The damage resulting from fraudulent activities shows that they cost the victims and financial institutions a significant amount of money. According to the statistics from the Internet Crime Complaint Center, there has been a substantial soar in reported fraud activities in the last decade (Hou et al., 2020).

Industries and research institutions have invested heavily to develop effective methods to combat the problem with emerging machine learning, deep learning, big data, and computational intelligence technologies (Cai & Zhang, 2020; Chua & Storey, 2016; Oreski & Oreski, 2014). Their efforts in this perspective have resulted in many approaches that can intelligently differentiate legitimate transactions from fraudulent ones. However, no matter what methods are applied, some common problems still exist and often reduce their performance and efficiency. For instance, one of the most common problems resides in the training data of the past transactions represented by unbalanced distribution, which causes various difficulties of overfitting and results in inferior performances of the implemented classifiers (Altinbas, 2020). These problems occur due to the relatively smaller number of available fraudulent samples than legitimate ones. This type of unbalance prevents the designation of a dependable model of assessment (Khemakhem, Said, & Boujelbene, 2018). Moreover, data heterogeneity and overlap are additional issues that aggravate the problem (Arora & Kaur, 2019). Computational complexity is another challenge for effectively identifying anomalies (Coser, Maer-Matei, & Albu, 2019; Xu et al., 2020; Ye et al., 2018). These problems significantly impact the efficacy of any fraud recognition techniques that may produce a large number of incorrect classifications.

In recent years, most studies on credit risk assessment models for financial institutions have focused on improving imbalanced data or enhancing classification accuracy through multistage modeling and deep learning. Although these methods can somewhat boost accuracy, the following research gaps still exist. First, low time responsiveness dominates as models with higher classification accuracy tend to have higher model complexity. Second, transparency and interpretability are lacking for the existing methods, along with the insufficient analysis of behavior features (Laughlin, Sankaranarayanan, & El-Khatib, 2020). Therefore, to address the research gaps with the motivation of improving high efficiency and interpretability, we study the research questions in this paper as follows:

(1)
How to build an efficient and interpretable fraud detection model based on the characteristics of the financial domain?
(2)
How to obtain knowledge about the risks associated with credit assessment? And what are the implications for financial institutions?

Complete Article List

Search this Journal:

Reset

Volume 35: 1 Issue (2024)

Volume 34: 3 Issues (2023)

Volume 33: 5 Issues (2022): 4 Released, 1 Forthcoming

Volume 32: 4 Issues (2021)

Volume 31: 4 Issues (2020)

Volume 30: 4 Issues (2019)

Volume 29: 4 Issues (2018)

Volume 28: 4 Issues (2017)

Volume 27: 4 Issues (2016)

Volume 26: 4 Issues (2015)

Volume 25: 4 Issues (2014)

Volume 24: 4 Issues (2013)

Volume 23: 4 Issues (2012)

Volume 22: 4 Issues (2011)

Volume 21: 4 Issues (2010)

Volume 20: 4 Issues (2009)

Volume 19: 4 Issues (2008)

Volume 18: 4 Issues (2007)

Volume 17: 4 Issues (2006)

Volume 16: 4 Issues (2005)

Volume 15: 4 Issues (2004)

Volume 14: 4 Issues (2003)

Volume 13: 4 Issues (2002)

Volume 12: 4 Issues (2001)

Volume 11: 4 Issues (2000)

Volume 10: 4 Issues (1999)

Volume 9: 4 Issues (1998)

Volume 8: 4 Issues (1997)

Volume 7: 4 Issues (1996)

Volume 6: 4 Issues (1995)

Volume 5: 4 Issues (1994)

Volume 4: 4 Issues (1993)

Volume 3: 4 Issues (1992)

Volume 2: 4 Issues (1991)

Volume 1: 2 Issues (1990)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Credit Risk Models for Financial Fraud Detection: A New Outlier Feature Analysis Method of XGBoost With SMOTE

Abstract

1. Introduction

Complete Article List