Predictive Modeling for Imbalanced Big Data in SAS Enterprise Miner and R

Predictive Modeling for Imbalanced Big Data in SAS Enterprise Miner and R

Son Nguyen, Alan Olinsky, John Quinn, Phyllis Schumacher
Copyright: © 2018 |Volume: 1 |Issue: 2 |Pages: 26
ISSN: 2572-4908|EISSN: 2572-4894|EISBN13: 9781522528678|DOI: 10.4018/IJFC.2018070103
Cite Article Cite Article

MLA

Nguyen, Son, et al. "Predictive Modeling for Imbalanced Big Data in SAS Enterprise Miner and R." IJFC vol.1, no.2 2018: pp.83-108. http://doi.org/10.4018/IJFC.2018070103

APA

Nguyen, S., Olinsky, A., Quinn, J., & Schumacher, P. (2018). Predictive Modeling for Imbalanced Big Data in SAS Enterprise Miner and R. International Journal of Fog Computing (IJFC), 1(2), 83-108. http://doi.org/10.4018/IJFC.2018070103

Chicago

Nguyen, Son, et al. "Predictive Modeling for Imbalanced Big Data in SAS Enterprise Miner and R," International Journal of Fog Computing (IJFC) 1, no.2: 83-108. http://doi.org/10.4018/IJFC.2018070103

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

There have been a variety of predictive models capable of handling binary targets, ranging from traditional logistic regression to modern neural networks. However, when the target variable represents a rare event, these models might not be appropriate as they assume that the distribution in the target variable is balanced. In this article, the impact of multiple resampling methods on conventional predictive models is studied. These resampling techniques include the methods of oversampling of the rare events, undersampling of the common events in the data, and synthetic minority over-sampling technique (SMOTE). The predictive models of decision trees, logistic regression and rule induction are applied with SAS Enterprise Miner (EM) software to the revised data. The studied data set is of home mortgage applications which includes a target variable with an occurrence rate of the rare event being 0.8%. The authors varied the percentage of the rare event from the original of 0.8% up to 50% and monitored the associated performances of the three predictive models to see which one worked the best.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.