Evaluation of Data Imbalance Algorithms on the Prediction of Credit Card Fraud

Evaluation of Data Imbalance Algorithms on the Prediction of Credit Card Fraud

Godlove Otoo, Justice Kwame Appati, Winfred Yaokumah, Michael Agbo Tettey Soli, Stephane Jnr Nwolley, Julius Yaw Ludu
Copyright: © 2021 |Pages: 26
DOI: 10.4018/IJIIT.289967
Article PDF Download
Open access articles are freely available for download

Abstract

Credit card fraud has been on the rise for some years now after the introduction of card payment systems. To curb this menace, computational methods have been proposed. Unfortunately, the data available for such a study is highly skewed resulting in the data imbalance problem. In this study, the authors investigate the performance of some selected data imbalance algorithms employed in the prediction of credit card fraud. A dataset from Kaggle containing 284,315 genuine transactions and 492 fraudulent transactions was used for the evaluation. The machine learning algorithms deployed for the study is logistic regression, naïve bayes, and the k-nearest neighbour algorithm with F1 score and precision-recall area under the curve (PR AUC) as the metric. Numerical assessment of the performance of the adopted algorithm gave a rate of 82.5% and 81%, respectively, using neighbourhood cleaning rule for undersampling.
Article Preview
Top

Introduction

Credit card fraud is a significant problem for banks worldwide, resulting in a massive loss of money. Technically, it is defined as the deliberate act of obtaining illegal benefits from users. Scammers today have taken advantage of the adoption of credit card systems as a primary online and offline mode of payment (Gamini, 2021). According to The Nilson Report, credit card fraud losses reached $21 billion globally in 2015 and are expected to hit $31 billion by 2020, of which at least 46% are victims from America between 2012 and 2016 (Taha & Malebary, 2020). In 2016, China also experienced a 3.8% increment in registered credit card fraud compared to the previous year. Also, according to the USFTC, identity fraud has increased by 21% in 2008, after remaining stable for the last few years (Rtayli & Enneya, 2020). Despite the challenges, credit card purchases have become commonplace in recent years, with the attribution of fraud to the increased use of internet credit cards in e-banking systems (Mittal & Tyagi, 2019). These threats have called for banks and card-related businesses to regularly step up their operations to identify credit card fraud identification to curb the menace (Wu, Xu, & Li, 2019; Kumari & Mishra, 2019). Though computational approaches are leveraged for the identification of fraud, the systems are data-dependent, and unfortunately, the data used for such tasks are significantly hampered by the data imbalance challenge (Darwish, 2020).

Several banks in the world today is concerned about protecting card payments and the general public's interest in making card payments (Gómez, Arévalo, Paredes, & Nin, 2018) as it is incredibly more convenient (Khatri, Arora, & Agrawal, 2020). In many advanced countries, credit cards are one of the most common payment methods for online transactions. It has made online purchases more accessible and useful as advanced technology such as the Internet of Things, mobile computing, and the Internet have evolved. However, it has also provided new opportunities for fraudsters and a unique challenge for implementers or innovators (Jain, Tiwari, Dubey, & Jain, 2019). To combat this issue, financial institutions use various fraud prevention models (Save, Tiwarekar, N., & Mahyavanshi, 2017), but these fraudsters are adaptable to the formulation of a different way to breach these protective models when given enough time. Fraudsters today can be a very imaginative, intelligent, and fast-moving group of people. Despite the best efforts of financial firms, law enforcement authorities, and the government, credit card fraud continues to grow.

Scientifically, credit card fraud detection systems are critical mechanism for preventing fraud incidents. The mechanism is usually divided into two categories: Anomaly detection and classifier-based detection. On the one hand, Anomaly detection is concerned with determining the distance between data points in space. They operate by filtering any incoming transaction which is inconsistent with the cardholder's profile while measuring the distance between them. On the other hand, the classifier-based detection approach employs machine learning techniques to train a classifier using supervised binary classification systems that have been adequately trained from pre-screened sample datasets (Zheng, Yan, Gou, & Wang, 2020).

The second approach is closely related to the data mining technique, and it is one of the most well-known tools for detecting credit fraud. Given adequate data, the method groups card transactions into two categories: legitimate (genuine) and fraudulent transactions. This decision is notoriously challenging due to field datasets' inherently imbalanced and distorted nature (Nadim, Sayem, Mutsuddy, & Chowdhury, 2019). The type of sampling method used, the variables chosen, and the detection technique(s) also used significantly has an impact on the efficiency of the credit card fraud detection system (Awoyemi, Adetunmbi, & Oluwadare, 2017; Kim et al., 2019; Rushin, Stancil, Sun, Adams, & Beling, 2017).

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing