New Perspectives of Pattern Recognition for Automatic Credit Card Fraud Detection

New Perspectives of Pattern Recognition for Automatic Credit Card Fraud Detection

Addisson Salazar (Universitat Politècnica de València, Spain), Gonzalo Safont (Universitat Politècnica de València, Spain), Alberto Rodriguez (Universidad Miguel Hernández de Elche, Spain) and Luis Vergara (Universitat Politècnica de València, Spain)
Copyright: © 2018 |Pages: 14
DOI: 10.4018/978-1-5225-2255-3.ch428

Abstract

Automatic credit card fraud detection (ACCFD) is a challenge issue that has been increasingly studied considering expanded potential of new technologies to emulate legitimate operations. Solution has to handle with fraud behavior changing in time; detection in data with very small fraud/legitimate operations ratio; and accomplish operation requirements of very low false alarm in real-time processing. In this chapter, main issues related with the problem of ACCFD and proposed solutions are discussed from theoretical and practical standpoints. The perspective of detection analyses from receiving operating characteristic curves and business key performance indicators are jointly analyzed. A new conceptual framework for ACCFD considering decision fusion and surrogate data is outlined including a case of study with different proportions of real and surrogate data. In addition, the sensitivity of the methods to different proportions of fraud/legitimate ratios is tested. Finally, theoretical and practical conclusions are provided as well as several open lines of research are proposed.
Chapter Preview
Top

Background

Cyber-security and privacy have become very important subjects of research in recent years. This research spans many different fields, such as: security in the physical layer of wireless communications (Poor, 2012)); database security (Sankar, Rajagopalan, & Poor, 2013); distributed systems (Pawar, El Rouayheb, & Ramchandran, 2011); and biometrics (Lifeng, Ho, & Poor, 2011). One activity where the security and privacy mechanisms are critical is the e-commerce by using credit cards. This application features a massive volume of on-line transactions that are continuously exposed to frauds. Fraud detection in credit card transactions is a critical problem affecting large financial companies and involving annually loss of billions of dollars (Bhattacharyya, Jha, Tharakunnel, & Westland, 2011).

Basically two strategies can be raised. The first consists of defining the problem as one-class classification, and thus, characterizing the largest data population (the legitimate transactions) and considering all the data with different characteristics as outliers (Hodge & Austin, 2004) (Tax & Duin, 2001). The second strategy is to define the problem as a two-class classification characterizing legitimate and fraudulent transaction data. We have concentrated in this later detection approach which takes full advantage of the available labeled data.

There is extensive literature that reviews and provides taxonomies and comparisons about the large number of ACCFD methods that have been developed during the last two decades (e.g., (Danenas, 2015)). However, only few of these references are from the research field of signal processing. The particular characteristics of ACCFD make this a challenging problem for signal processing algorithms (Salazar, Safont, Soriano, & Vergara, 2012). Optimum design of the algorithms depends on the detection models employed to estimate the multidimensional joint distribution of the random variables underlying the data.

Figure 1 shows an outline of the proposed signal processing procedure. The multivariate surrogate data is obtained following the methods explained in (Salazar, Safont, & Vergara, Surrogate techniques for testing fraud detection algorithms in credit card operations, 2014). The pre-processing step consists of applying principal component analysis (PCA) to reduce dimensionality of the data preserving 95% of data variance.

Figure 1.

Outline of the signal processing procedure

Key Terms in this Chapter

Receiving Operating Characteristic (ROC) Curve: A numeric tool for the evaluation of a detection process that implements comparison in a coordinate plane of probability of detection and probability of false alarm at different operating points from 0 to 1.

Pattern Recognition: Automatic process of extracting, representing, and splitting conspicuous characteristics from a dataset to produce several subsets that can be associated with concepts normally accepted by humans.

Surrogate Data: Data that are computationally generated that behave with statistical features similar to that of real data.

Fusion of Scores: A process for efficient combination of a set of scores or probabilistic grades granted by a set of detectors/classifiers. Evaluation of combination performance depends on the application objectives.

Classifier: A computational method that can be trained using known labeled data for predicting the label of unlabeled data. If there's only two labels (also called classes), the method is called “detector”.

Complete Chapter List

Search this Book:
Reset