Predictive Modeling for Imbalanced Big Data in SAS Enterprise Miner and R

Son Nguyen, Alan Olinsky, John Quinn, Phyllis Schumacher

Source Title: International Journal of Fog Computing (IJFC)1(2)

ISSN: 2572-4908|EISSN: 2572-4894|EISBN13: 9781522528678|DOI: 10.4018/IJFC.2018070103

MLA

Nguyen, Son, et al. "Predictive Modeling for Imbalanced Big Data in SAS Enterprise Miner and R." IJFC vol.1, no.2 2018: pp.83-108. http://doi.org/10.4018/IJFC.2018070103

APA

Nguyen, S., Olinsky, A., Quinn, J., & Schumacher, P. (2018). Predictive Modeling for Imbalanced Big Data in SAS Enterprise Miner and R. International Journal of Fog Computing (IJFC), 1(2), 83-108. http://doi.org/10.4018/IJFC.2018070103

Chicago

Nguyen, Son, et al. "Predictive Modeling for Imbalanced Big Data in SAS Enterprise Miner and R," International Journal of Fog Computing (IJFC) 1, no.2: 83-108. http://doi.org/10.4018/IJFC.2018070103

Export Reference

Favorite Full-Issue Download

View Full Text HTML

View Full Text PDF

Abstract

There have been a variety of predictive models capable of handling binary targets, ranging from traditional logistic regression to modern neural networks. However, when the target variable represents a rare event, these models might not be appropriate as they assume that the distribution in the target variable is balanced. In this article, the impact of multiple resampling methods on conventional predictive models is studied. These resampling techniques include the methods of oversampling of the rare events, undersampling of the common events in the data, and synthetic minority over-sampling technique (SMOTE). The predictive models of decision trees, logistic regression and rule induction are applied with SAS Enterprise Miner (EM) software to the revised data. The studied data set is of home mortgage applications which includes a target variable with an occurrence rate of the rare event being 0.8%. The authors varied the percentage of the rare event from the original of 0.8% up to 50% and monitored the associated performances of the three predictive models to see which one worked the best.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Predictive Modeling for Imbalanced Big Data in SAS Enterprise Miner and R

MLA

APA

Chicago

Export Reference

Abstract

Request Access