Bagging Probit Models for Unbalanced Classification

Bagging Probit Models for Unbalanced Classification

Hualin Wang, Xiaogang Su
DOI: 10.4018/978-1-60566-717-1.ch017
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This chapter presents an award-winning algorithm for the data mining competition of PAKDD 2007, in which the goal is to help a financial company to predict the likelihood of taking up a home loan for their credit card based customers. The involved data are very limited and characterized by very low buying rate. To tackle such an unbalanced classification problem, the authors apply a bagging algorithm based on probit model ensembles. One integral element of the algorithm is a special way of conducting the resampling in forming bootstrap samples. A brief justification is provided. This method offers a feasible and robust way to solve this difficult yet very common business problem.
Chapter Preview
Top

Bagging With Weighted Resampling

The common approach to unbalanced classification is to modify the weights, borrowing the idea from retrospective designs (see, e.g., Agrestri, 1990). This amounts to either decreasing the weight for the majority class by under-sampling or increasing the weight for the minority class by over-sampling. However, how to adjust the weights is quite an art. In the following, we shall present our procedure with justification and compare it with some alternative approaches.

To proceed, we first introduce some notations to set up the problem. Let

978-1-60566-717-1.ch017.m01
denote the training sample, where 978-1-60566-717-1.ch017.m02 is the 978-1-60566-717-1.ch017.m03-th binary 0-1 outcome with Class 1 severely underrepresented and 978-1-60566-717-1.ch017.m04is the associated input vector. Let 978-1-60566-717-1.ch017.m05denote the test sample that contains the input information only.

Let 978-1-60566-717-1.ch017.m06 denote the distribution underlying the data. What is under modeling is the conditional probability that 978-1-60566-717-1.ch017.m07 is equal to 1 conditioning on978-1-60566-717-1.ch017.m08, i.e.,

Complete Chapter List

Search this Book:
Reset