Bagging Probit Models for Unbalanced Classification

Bagging Probit Models for Unbalanced Classification

Hualin Wang (AllianceData, USA) and Xiaogang Su (University of Central Florida, USA)
DOI: 10.4018/978-1-60566-717-1.ch017
OnDemand PDF Download:
List Price: $37.50


This chapter presents an award-winning algorithm for the data mining competition of PAKDD 2007, in which the goal is to help a financial company to predict the likelihood of taking up a home loan for their credit card based customers. The involved data are very limited and characterized by very low buying rate. To tackle such an unbalanced classification problem, the authors apply a bagging algorithm based on probit model ensembles. One integral element of the algorithm is a special way of conducting the resampling in forming bootstrap samples. A brief justification is provided. This method offers a feasible and robust way to solve this difficult yet very common business problem.
Chapter Preview

Bagging With Weighted Resampling

The common approach to unbalanced classification is to modify the weights, borrowing the idea from retrospective designs (see, e.g., Agrestri, 1990). This amounts to either decreasing the weight for the majority class by under-sampling or increasing the weight for the minority class by over-sampling. However, how to adjust the weights is quite an art. In the following, we shall present our procedure with justification and compare it with some alternative approaches.

To proceed, we first introduce some notations to set up the problem. Let

denote the training sample, where is the -th binary 0-1 outcome with Class 1 severely underrepresented and is the associated input vector. Let denote the test sample that contains the input information only.

Let denote the distribution underlying the data. What is under modeling is the conditional probability that is equal to 1 conditioning on, i.e.,

Complete Chapter List

Search this Book: