Individual Prediction Reliability Estimates in Classification and Regression

Individual Prediction Reliability Estimates in Classification and Regression

Darko Pevec, Zoran Bosnic, Igor Kononenko
DOI: 10.4018/978-1-4666-1806-0.ch003
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Current machine learning algorithms perform well in many problem domains, but in risk-sensitive decision making – for example, in medicine and finance – experts do not rely on common evaluation methods that provide overall assessments of models because such techniques do not provide any information about single predictions. This chapter summarizes the research areas that have motivated the development of various approaches to individual prediction reliability. Based on these motivations, the authors describe six approaches to reliability estimation: inverse transduction, local sensitivity analysis, bagging variance, local cross-validation, local error modelling, and density-based estimation. Empirical evaluation of the benchmark datasets provides promising results, especially for use with decision and regression trees. The testing results also reveal that the reliability estimators exhibit different performance levels when used with different models and in different domains. The authors show the usefulness of individual prediction reliability estimates in attempts to predict breast cancer recurrence. In this context, estimating prediction reliability for individual predictions is of crucial importance for physicians seeking to validate predictions derived using classification and regression models.
Chapter Preview
Top

Motivations From The Field Of Model Analysis

An appropriate criterion for differentiating between various approaches is whether they target a specific predictive model or are model-independent. Many researchers are working to develop methods specific to the neural network model, but here, we focus on model-independent (i.e., “black box”) approaches.

Model-independent approaches are based on exploiting general supervised learning framework parameters such as learning sets and attributes. These approaches include observing how a particular learning example locally influences a model, conducting local error modelling, and utilizing other properties of the input domain such as its density distribution. They are defined independently of any predictive model formalization. This assures greater generalisability because it offers users more freedom to choose the predictive model that suits the problem best. However, the reliability estimates based on these approaches are usually not probabilistically interpretable, meaning that they can take values from an arbitrary interval of numbers and are therefore harder to analytically evaluate.

In the following, we summarize the work in three related research fields that has motivated the development of the model-independent approaches. Fields that deal with perturbed data and the use of unlabelled examples in supervised learning are generally concerned with accuracy performance and with evaluating the whole predictive model. Both of these fields exploit variations in the original learning set to improve general model accuracy. Some of these methods also focus on weighing and analysing the role of individual examples in learning set variation. One way to further apply this approach is to use transduction and sensitivity analysis as a general framework, as indicated in the last portion of this section.

Complete Chapter List

Search this Book:
Reset