Detection of Automobile Insurance Fraud Using Feature Selection and Data Mining Techniques

Detection of Automobile Insurance Fraud Using Feature Selection and Data Mining Techniques

Sharmila Subudhi (Veer Surendra Sai University of Technology, Burla, India) and Suvasini Panigrahi (Veer Surendra Sai University of Technology, Burla, India)
Copyright: © 2018 |Pages: 20
DOI: 10.4018/IJRSDA.2018070101

Abstract

This article presents a novel approach for fraud detection in automobile insurance claims by applying various data mining techniques. Initially, the most relevant attributes are chosen from the original dataset by using an evolutionary algorithm based feature selection method. A test set is then extracted from the selected attribute set and the remaining dataset is subjected to the Possibilistic Fuzzy C-Means (PFCM) clustering technique for the undersampling approach. The 10-fold cross validation method is then used on the balanced dataset for training and validating a group of Weighted Extreme Learning Machine (WELM) classifiers generated from various combinations of WELM parameters. Finally, the test set is applied on the best performing model for classification purpose. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world automobile insurance defraud dataset. Besides, a comparative analysis with another approach justifies the superiority of the proposed system.
Article Preview
Top

1. Introduction

An automobile insurance file is a legally abiding contract signed between an insurance company (insurer) and owner of a vehicle (insured) to provide financial support during vehicular theft or damage. Automobile insurance fraud depicts a situation, where the insured attains the financial profit by submitting forged documents to the company by showing damage to the vehicle in staged (fake) accidents or monetary claims for past losses (Ngai et al., 2011). This fraud can be carried out by persons, like, drivers, chiropractors, garage mechanics, lawyers, police officers, insurance workers and others (Šubelj et al., 2011). The automobile insurance fraud can be segregated into various types, such as, filing a false insurance claim file (an easier way), to a more deceitful way like fabricating an accident or auto thefts (Šubelj et al., 2011; Abdallah et al., 2016).

Insurance Research Council (IRC) has conducted a research in 2012 to reveal that in the United States, the amount of auto insurance fraud has reached in between $5.6 billion and $7.7 billion owing to excessive payments regarding injury claims (“Fraud adds up to”, 2015). This report also highlights some evidence of fraud linked in 21% of injury claims and 18% of personal injury protection claims. A report published in 2013 in the United States shows that among the vehicle thefts of around 70,000 cases, some thieveries were planned by the owners (Tidball, 2015). An investigation led by Insurance Fraud Bureau of Australia in 2013 reported that the illegitimate claims have fetched more than $2 billion in Australia than the previous year (“Australia: Insurance fraud costs, 2016).

Another probe carried out by Association of British Insurers (ABI) indicates the increasing tendency in filing forged insurance applications in 2014, which is 18% more than in 2013 (“Cutting corners to get”, 2015). These statistics demonstrate the seriousness of the issue and hence, needs to be handled strongly for reducing the losses caused by such deceitful actions.

Abdallah et al. (2016) has mentioned certain issues during identification of fraud cases in auto insurance. Firstly, the inappropriate depiction of data related to a claim makes the detection of fraudulent activities extremely difficult (Šubelj et al., 2011). In addition, it is to be noted that the number of illegitimate claims takes only a small portion in the total accidental claims. Thus, the detection of fraudulent cases becomes more challenging due to the presence of skewed class distribution (Jensen, 1997). Because, in terms of fraud detection, usually the genuine cases refer to the majority class sample and the forged instances depicts the minority class sample. Generally, most of the standard machine learning algorithms are biased towards the majority classes while ignoring the minority classes during classification of an imbalanced dataset. This leads to the achievement of high accuracy for major class points and poor performance for the minor class instances (Zhou and Liu, 2006). This results in building an ineffective classification model with imbalanced data (Chawla, 2003). According to Japkowicz (2003), the presence of small sized clusters, which cannot be classified accurately, greatly affect the performance of a classifier during the classification of an uneven dataset. Hence, the existence of these clusters is needed to be addressed effectively in order to reduce the data imbalance problem present in a dataset.

Moreover, while using machine learning techniques on a dataset for classification propose, their success is always dependent on the representation of the data attributes as the efficiency of a classifier is affected by them (Xue et al., 2013). Furthermore, the selection of the most valuable features is relevant, since irrelevant attributes can mislead a classifier by affecting the suitable features. The basic concept behind feature selection is that it chooses a subset of attributes without modifying the original semantics, thus increasing the classifier performance (Saeys et al., 2007). A variety of feature selection techniques have been used for choosing the best attributes. However, most of them are affected by several issues like higher memory usage and high computational cost (Saeys et al., 2007). In order to handle this problem efficiently, various evolutionary algorithms based feature selection technique have been proposed.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 7: 4 Issues (2020): Forthcoming, Available for Pre-Order
Volume 6: 4 Issues (2019): 2 Released, 2 Forthcoming
Volume 5: 4 Issues (2018)
Volume 4: 4 Issues (2017)
Volume 3: 4 Issues (2016)
Volume 2: 2 Issues (2015)
Volume 1: 2 Issues (2014)
View Complete Journal Contents Listing