Article Preview
Top1. Introduction
In an automobile insurance, an insurance service provider organization is entitled to give financial support to a customer (insured) in an event of theft or damage to the vehicle. The automobile insurance fraud takes place when the insured tries to file documents regarding the losses happened in a staged accident or for the casualties occurred in the past in order to accumulate monetary profit (Ngai et al., 2011). Besides, apart from the fraudster, various third-party persons like, garage mechanics, police officers and insurance agents are involved in this type of fraud (Šubelj et al., 2011). According to a report published by Verisk Insurance Solutions, automobile insurance companies suffer from $29 Billion losses annually due to the fraudulent incidents (Lekas, 2017). From the statistics, the austerity of the problem is evident and thus, require to be handled sternly for reducing the losses induced by such anomalous attempts.
Furthermore, several issues exist during the identification of fraudulent automobile insurance claims (Abdallah et al, 2016). Firstly, the improper data representation regarding an insurance case makes the fraud detection extremely difficult (Šubelj et al., 2011). Secondly, the presence of small fraction of forged claims leads to an imbalanced class distribution in the dataset, which makes the detection even more challenging (Jensen, 1997). Hence, accurate classification of those malicious instances is essential for any Automobile Insurance Fraud Detection System (AIFDS). Furthermore, a FDS doing iterative calculation for discriminating fraudulent instances may require high computation time (Panigrahi et al., 2013). Therefore, there is a need to develop a robust AIFDS that is able to segregate the malicious insurance claims from the genuine ones efficiently in less amount of time.
This paper proposes a novel hybrid AIFDS that initially applies the Fuzzy C-Means clustering (FCM) as an undersampling method to remove the noisy points from the majority class samples of the original unbalanced dataset and generate a reduced balanced dataset. The Genetic Algorithm (GA) is then employed for optimizing the generated fuzzy cluster centres. A new insurance claim is discriminated into any of these three categories – genuine, malicious, and suspicious based on its computed distance measure obtained from the optimized cluster centers. The claim labeled as genuine clearly passes for payment, and the fraudulent claim is blocked. In addition, the proposed AIFDS employs four different trained supervised classifier models - Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Decision Tree (DT) and Group Method of Data Handling (GMDH) independently to further verify the suspicious claims and selects the best performing classifier among them. All the classifiers use the balanced dataset for training prior to the classification.
The rest of the paper organizes as follows: Section 2 briefly introduces the related research carried out in this field and Section 3 sheds some light into the background study of the techniques used in the current work. Section 4 focuses on the proposed AIFDS. Section 5 deals with the experimentation and comparative performance analysis to demonstrate the effectiveness of the proposed approach. Finally, Section 6 concludes the paper by providing a brief summary of the contributions made.