Identifying Fraudulent Behaviors in Healthcare Claims Using Random Forest Classifier With SMOTEchnique

Identifying Fraudulent Behaviors in Healthcare Claims Using Random Forest Classifier With SMOTEchnique

Naga Jyothi P., Rajya Lakshmi D., Rama Rao K. V. S. N.
Copyright: © 2020 |Pages: 18
DOI: 10.4018/IJeC.2020100103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Detecting fraudulent and abusive cases in healthcare is one of the most challenging problems for data mining studies. Existing studies have a lack of real data for analysis and focus on a very partial version of the problem by covering only a specific actor, healthcare service, or disease. In this article, the proposed strategy identifies fraudulent behaviors in Medicare claims data using several predictors as model inputs. The methodology involves preprocessing and model development phases. At the initial phase, the feature mining is done by estimating their feature importance score. The labeling of instances by using the classification rules to the whole dataset. Thus, a transformed dataset is obtained by the model. In the development phase, the RF with SMOTE is applied against the training and testing data. Specifically, SMOTE adapted to balance data and sorts misclassified instances and finds the interesting instances. The results of the proposed model improvises the classifier performance RF with SMOTE when contrast with RF method.
Article Preview
Top

1. Introduction

There is a demand for more cost-effective healthcare programs as there is raise in population in all over the world. Medicare program is implicitly helping the needs of the elderly people. The number of individuals enlisted for medical services is expanding every day. Thusly tremendous volume of cash in the human services industry is expanding the cases and more danger of fraudulent exercises.

In Medicare systems there are three main parties could commit fraud: Medicinal services suppliers or healthcare providers, recipients, beneficiaries or patients, insurance carriers or protection bearers. Fraud is well-defined as intentional deception of falsifying the records to get unauthorized benefit from insurance company or person. The false behavior is recognized by various aspects of the dataset which are configured with physician, provider, patient, and procedure codes (specialty description). Through billing services of insurers. i.e. by duplicating the bills which are may be done by patients and providers likewise the fraud is initiated. If the provider participates in Medicaid service, the claim is directed to pay the reimbursed expenses. If the provider is not a part of Medicaid service then the state monitors and processes the Medicaid reimbursement by checking the various payment history, codes and edits the claims legitimacy by audit mechanism. This mechanism is unable to find the trustworthiness of claimed services and whether the diagnosis is correct for a patient. They have not meant to detect these types of activities of fraud schemes. As there is scarcity of human services for auditing the individual claims on daily and monthly basis and following the business rules that emphasize is on analysis of claims. Even in most of circumstances the audit team used to charge more amounts of money. In such cases the system automatizes the payments and verifies the reliability with constraints posed by the business companies according to their requirements.

Han et al. (2000) showed in Data mining can address these issues which are carried by many researchers, in the fields of data mining and statistics, fraud is defined as outlier, and it is one of the tasks. The outlier is an observation where its value deviates from a normal and notifies out as suspicious. In the process of analysis, to recognize and reveal the systematic faults in the data, in which the outlier analysis primarily works. Outlier analysis is gaining importance in wide applications domains like a tax, credit card, insurance, cyber security, military, healthcare, etc. The strategy of outlier detection is to first define the normal region (behavior) to every possibility considering all factors, and it is tricky.

The margin between the normal to outlier data is very uncertain because there is a slight modification between these two points is defined by Varun Chandola et al. (2007). By considering various limitations and requirements the designs of outlier detection models are diverse for various applications and which are specific to the domain. Tan et al. (2005) gave the factors needed to consider for outlier detection is the nature of data, application domain, and its knowledge discipline. The outlier detection technique is to find the abnormal patterns from the input data. The nature of input data instance has many attributes and are of different types like categorical, binary and may be multivariate or univariate (multiple or single data types). The feature selection for outlier detection technique is to select the best features from the input to give the best results by the algorithm. Apart from the nature of data, most predictive models have been using the labeled data for training purpose in which labels generally define the normal or outlier by Mitchell (1997).

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 7 Issues (2023)
Volume 18: 6 Issues (2022): 3 Released, 3 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing