Reference Hub1
Unbalanced Sequential Data Classification using Extreme Outlier Elimination and Sampling Techniques

Unbalanced Sequential Data Classification using Extreme Outlier Elimination and Sampling Techniques

T.Maruthi Padmaja, Raju S. Bapi, P. Radha Krishna
ISBN13: 9781613500569|ISBN10: 1613500564|EISBN13: 9781613500576
DOI: 10.4018/978-1-61350-056-9.ch005
Cite Chapter Cite Chapter

MLA

Padmaja, T.Maruthi, et al. "Unbalanced Sequential Data Classification using Extreme Outlier Elimination and Sampling Techniques." Pattern Discovery Using Sequence Data Mining: Applications and Studies, edited by Pradeep Kumar, et al., IGI Global, 2012, pp. 83-93. https://doi.org/10.4018/978-1-61350-056-9.ch005

APA

Padmaja, T., Bapi, R. S., & Krishna, P. R. (2012). Unbalanced Sequential Data Classification using Extreme Outlier Elimination and Sampling Techniques. In P. Kumar, P. Krishna, & S. Raju (Eds.), Pattern Discovery Using Sequence Data Mining: Applications and Studies (pp. 83-93). IGI Global. https://doi.org/10.4018/978-1-61350-056-9.ch005

Chicago

Padmaja, T.Maruthi, Raju S. Bapi, and P. Radha Krishna. "Unbalanced Sequential Data Classification using Extreme Outlier Elimination and Sampling Techniques." In Pattern Discovery Using Sequence Data Mining: Applications and Studies, edited by Pradeep Kumar, P. Radha Krishna, and S. Bapi Raju, 83-93. Hershey, PA: IGI Global, 2012. https://doi.org/10.4018/978-1-61350-056-9.ch005

Export Reference

Mendeley
Favorite

Abstract

Predicting minority class sequence patterns from the noisy and unbalanced sequential datasets is a challenging task. To solve this problem, we proposed a new approach called extreme outlier elimination and hybrid sampling technique. We use k Reverse Nearest Neighbors (kRNNs) concept as a data cleaning method for eliminating extreme outliers in minority regions. Hybrid sampling technique, a combination of SMOTE to oversample the minority class sequences and random undersampling to undersample the majority class sequences is used for improving minority class prediction. This method was evaluated in terms of minority class precision, recall and f-measure on syntactically simulated, highly overlapped sequential dataset named Hill-Valley. We conducted the experiments with k-Nearest Neighbour classifier and compared the performance of our approach against simple hybrid sampling technique. Results indicate that our approach does not sacrifice one class in favor of the other, but produces high predictions for both fraud and non-fraud classes.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.