Predictive Modelling for Financial Fraud Detection Using Data Analytics: A Gradient-Boosting Decision Tree

Predictive Modelling for Financial Fraud Detection Using Data Analytics: A Gradient-Boosting Decision Tree

Ntebogang Dinah Moroke, Katleho Makatjane
DOI: 10.4018/978-1-7998-9430-8.ch002
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Financial fraud remains one of the most discussed topics in literature. The financial scandals of Enron, WorldCom, Qwest, Global Crossing, and Tyco resulted in approximately 460 billion dollars of loss. The detection of financial fraud, therefore, has become a critical task for financial practitioners. Three factors determine the likelihood of fraud occurrence, including pressure, opportunity, and rationalization. The core of these factors lies in people's beliefs and behaviour. Due to the unpredictability and uncertainty in fraudsters' incentives and techniques, fraud detection requires a skill set that encompasses both diligence and judgment. Big data technologies have had a huge impact on a wide variety of industries because they tend to be ubiquitous, starting in the last decade and continuing today.
Chapter Preview
Top

Literature Review

An extensive comprehension of fraud identification technologies can be useful for tackling the issue of credit and debit cards. The empirical analysis by Beigi et al. (2020) proposed combined methods using both data mining and statistical tasks, utilizing feature selection, re-sampling, and cost-sensitive learning for credit card fraud detection. In the first step, useful features are identified using a genetic algorithm. Next, the optimal re-sampling strategy is determined based on the design of experiments (DOE) and response surface methodologies. Finally, the cost-sensitive C4.5 algorithm is used as the base learner in the adaptive boosting (AdaBoost), algorithm. Using a real-time dataset, the results of these authors showed that; applying the proposed method significantly reduces misclassification costs by at least 14% compared with a decision tree, naïve Bayes, Bayesian network, neural network, and artificial immune system.

Key Terms in this Chapter

Supervised Learning: A machine learning method that maps an input to an output based on the input-output pairs of data

Cross-Validation: A re-sampling technique that uses diverse percentages of a dataset to train and test the model of improved iterations.

Deep Learning: This is the branch of machine learning and artificial intelligence that extract knowledge about the processing of the image or quantitative data.

Big Data: Data that is received in high volume and it can be stored in databases, and it comes at a high speed with various data formats.

Credit Card Fraud: An intention to illegally obtain money from a credit card that has been revoked, cancelled, reported lost or stolen to obtain anything of value.

Financial Fraud: It is the unauthorised taking of money in financial institutions such as banks.

Gradient Boosting Decision Tree: A branch of deep learning that uses regression and classification algorithms to produce a prediction model in the arrangement of an ensemble weak prediction.

Data Mining: The method of extracting inconsistencies, patterns, and relationships within large datasets to predict an outcome.

Complete Chapter List

Search this Book:
Reset