Two-Stage Automobile Insurance Fraud Detection by Using Optimized Fuzzy C-Means Clustering and Supervised Learning

Two-Stage Automobile Insurance Fraud Detection by Using Optimized Fuzzy C-Means Clustering and Supervised Learning

Sharmila Subudhi, Suvasini Panigrahi
Copyright: © 2020 |Pages: 20
DOI: 10.4018/IJISP.2020070102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

A novel two-stage automobile insurance fraud detection system is proposed that initially extracts a test set from the original imbalanced insurance dataset. A genetic algorithm based optimized fuzzy c-means clustering is then applied on the remaining data set for undersampling the majority samples by eliminating the outliers among them. Thereafter, the detection of the fraudulent claims occurs in two stages. In the first stage, each insurance record is passed to the clustering module that identifies the claim as genuine, malicious, or suspicious. The genuine and malicious samples are removed and only the suspicious instances are further scrutinized in the second stage by four trained supervised classifiers − Decision Tree, Support Vector Machine, Group Method for Data Handling and Multi-Layer Perceptron individually for final decision making. Extensive experiments and comparative analysis with another recent approach using a real-world automobile insurance dataset justifies the effectiveness of the proposed system.
Article Preview
Top

1. Introduction

In an automobile insurance, an insurance service provider organization is entitled to give financial support to a customer (insured) in an event of theft or damage to the vehicle. The automobile insurance fraud takes place when the insured tries to file documents regarding the losses happened in a staged accident or for the casualties occurred in the past in order to accumulate monetary profit (Ngai et al., 2011). Besides, apart from the fraudster, various third-party persons like, garage mechanics, police officers and insurance agents are involved in this type of fraud (Šubelj et al., 2011). According to a report published by Verisk Insurance Solutions, automobile insurance companies suffer from $29 Billion losses annually due to the fraudulent incidents (Lekas, 2017). From the statistics, the austerity of the problem is evident and thus, require to be handled sternly for reducing the losses induced by such anomalous attempts.

Furthermore, several issues exist during the identification of fraudulent automobile insurance claims (Abdallah et al, 2016). Firstly, the improper data representation regarding an insurance case makes the fraud detection extremely difficult (Šubelj et al., 2011). Secondly, the presence of small fraction of forged claims leads to an imbalanced class distribution in the dataset, which makes the detection even more challenging (Jensen, 1997). Hence, accurate classification of those malicious instances is essential for any Automobile Insurance Fraud Detection System (AIFDS). Furthermore, a FDS doing iterative calculation for discriminating fraudulent instances may require high computation time (Panigrahi et al., 2013). Therefore, there is a need to develop a robust AIFDS that is able to segregate the malicious insurance claims from the genuine ones efficiently in less amount of time.

This paper proposes a novel hybrid AIFDS that initially applies the Fuzzy C-Means clustering (FCM) as an undersampling method to remove the noisy points from the majority class samples of the original unbalanced dataset and generate a reduced balanced dataset. The Genetic Algorithm (GA) is then employed for optimizing the generated fuzzy cluster centres. A new insurance claim is discriminated into any of these three categories – genuine, malicious, and suspicious based on its computed distance measure obtained from the optimized cluster centers. The claim labeled as genuine clearly passes for payment, and the fraudulent claim is blocked. In addition, the proposed AIFDS employs four different trained supervised classifier models - Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Decision Tree (DT) and Group Method of Data Handling (GMDH) independently to further verify the suspicious claims and selects the best performing classifier among them. All the classifiers use the balanced dataset for training prior to the classification.

The rest of the paper organizes as follows: Section 2 briefly introduces the related research carried out in this field and Section 3 sheds some light into the background study of the techniques used in the current work. Section 4 focuses on the proposed AIFDS. Section 5 deals with the experimentation and comparative performance analysis to demonstrate the effectiveness of the proposed approach. Finally, Section 6 concludes the paper by providing a brief summary of the contributions made.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing