This chapter introduces cost-sensitive learning and its importance in medicine. Health managers and clinicians often need models that try to minimize several types of costs associated with healthcare, including attribute costs (e.g. the cost of a specific diagnostic test) and misclassification costs (e.g. the cost of a false negative test). In fact, as in other professional areas, both diagnostic tests and its associated misclassification errors can have significant financial or human costs, including the use of unnecessary resource and patient safety issues. This chapter presents some concepts related to cost-sensitive learning and cost-sensitive classification and its application to medicine. Different types of costs are also present, with an emphasis on diagnostic tests and misclassification costs. In addition, an overview of research in the area of cost-sensitive learning is given, including current methodological approaches. Finally, current methods for the cost-sensitive evaluation of classifiers are discussed.
Classification is one of the main tasks in knowledge discovery and data mining (Mitchell, 1997). It has been object of study in areas as machine learning, statistics and neural networks. There are many approaches for classification, including decision trees, Bayesian classifiers, neural classifiers, discriminant analysis, support vector machines, and rule induction, among many others.
The goal of classification is to correctly assign examples to one of a finite number of classes. In classification problems the performance of classifiers is usually measured using an error rate. The error rate is the proportion of errors detected in all instances and is an indicator of the global classifier performance. A large number of classification algorithms assume that the errors have the same cost and, because of that, are normally designed to minimize the number of errors (the zero-one loss). In these cases, the error rate is equivalent to assigning the same cost to all classification errors. For instance, in the case of a binary classification, false positives and false negatives would have equal cost. Nevertheless, in many situations, each type error may have a different associated cost.
In fact, in the majority of daily situations, decisions have distinct costs, and a bad decision may have serious consequences. It is therefore important to take into account the different costs associated to decisions, i.e., classification costs.
In this context, we may designate cost-sensitive classification when costs are ignored during the learning phase of a classifier and are only used when predicting new cases. In the other hand, we may call cost-sensitive learning when costs are considered during the learning phase and ignored, or not, when predicting new cases. In general, the cost-sensitive learning is a better option, with better results, as it considers costs during the process of generation of a new classifier. Cost-sensitive learning is the sub-area of Machine Learning concerned with situations of non-uniformity in costs.