Class Prediction in Test Sets with Shifted Distributions

Óscar Pérez, Manuel Sánchez-Montañés

Source Title: Encyclopedia of Artificial Intelligence

ISBN13: 9781599048499|ISBN10: 1599048493|EISBN13: 9781599048505

DOI: 10.4018/978-1-59904-849-9.ch044

MLA

Pérez, Óscar, and Manuel Sánchez-Montañés. "Class Prediction in Test Sets with Shifted Distributions." Encyclopedia of Artificial Intelligence, edited by Juan Ramón Rabuñal Dopico, et al., IGI Global, 2009, pp. 282-288. https://doi.org/10.4018/978-1-59904-849-9.ch044

APA

Pérez, Ó. & Sánchez-Montañés, M. (2009). Class Prediction in Test Sets with Shifted Distributions. In J. Rabuñal Dopico, J. Dorado, & A. Pazos (Eds.), Encyclopedia of Artificial Intelligence (pp. 282-288). IGI Global. https://doi.org/10.4018/978-1-59904-849-9.ch044

Chicago

Pérez, Óscar, and Manuel Sánchez-Montañés. "Class Prediction in Test Sets with Shifted Distributions." In Encyclopedia of Artificial Intelligence, edited by Juan Ramón Rabuñal Dopico, Julian Dorado, and Alejandro Pazos, 282-288. Hershey, PA: IGI Global, 2009. https://doi.org/10.4018/978-1-59904-849-9.ch044

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

Machine learning has provided powerful algorithms that automatically generate predictive models from experience. One specific technique is supervised learning, where the machine is trained to predict a desired output for each input pattern x. This chapter will focus on classification, that is, supervised learning when the output to predict is a class label. For instance predicting whether a patient in a hospital will develop cancer or not. In this example, the class label c is a variable having two possible values, “cancer” or “no cancer”, and the input pattern x is a vector containing patient data (e.g. age, gender, diet, smoking habits, etc.). In order to construct a proper predictive model, supervised learning methods require a set of examples x_i together with their respective labels c_i. This dataset is called the “training set”. The constructed model is then used to predict the labels of a set of new cases x_j called the “test set”. In the cancer prediction example, this is the phase when the model is used to predict cancer in new patients.

One common assumption in supervised learning algorithms is that the statistical structure of the training and test datasets are the same (Hastie, Tibshirani & Friedman, 2001). That is, the test set is assumed to have the same attribute distribution p(x) and same class distribution p(c|x) as the training set. However, this is not usually the case in real applications due to different reasons. For instance, in many problems the training dataset is obtained in a specific manner that differs from the way the test dataset will be generated later. Moreover, the nature of the problem may evolve in time. These phenomena cause p^Tr(x, c)

p^Test(x, c), which can degrade the performance of the model constructed in training.

Here we present a new algorithm that allows to re-estimate a model constructed in training using the unlabelled test patterns. We show the convergence properties of the algorithm and illustrate its performance with an artificial problem. Finally we demonstrate its strengths in a heart disease diagnosis problem where the training set is taken from a different hospital than the test set.

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Class Prediction in Test Sets with Shifted Distributions

MLA

APA

Chicago

Export Reference

Abstract

Request Access