Unbalanced Sequential Data Classification using Extreme Outlier Elimination and Sampling Techniques

Unbalanced Sequential Data Classification using Extreme Outlier Elimination and Sampling Techniques

T.Maruthi Padmaja (University of Hyderabad (UoH), India), Raju S. Bapi (University of Hyderabad (UoH), India) and P. Radha Krishna (SET Labs, Infosys Technologies Ltd, India)
DOI: 10.4018/978-1-61350-056-9.ch005
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Predicting minority class sequence patterns from the noisy and unbalanced sequential datasets is a challenging task. To solve this problem, we proposed a new approach called extreme outlier elimination and hybrid sampling technique. We use k Reverse Nearest Neighbors (kRNNs) concept as a data cleaning method for eliminating extreme outliers in minority regions. Hybrid sampling technique, a combination of SMOTE to oversample the minority class sequences and random undersampling to undersample the majority class sequences is used for improving minority class prediction. This method was evaluated in terms of minority class precision, recall and f-measure on syntactically simulated, highly overlapped sequential dataset named Hill-Valley. We conducted the experiments with k-Nearest Neighbour classifier and compared the performance of our approach against simple hybrid sampling technique. Results indicate that our approach does not sacrifice one class in favor of the other, but produces high predictions for both fraud and non-fraud classes.
Chapter Preview
Top

Background

This section describes the background of the methods used for proposing extreme outlier elimination and Hybrid sampling approach.

Complete Chapter List

Search this Book:
Reset