Cost-Sensitive Learning

Cost-Sensitive Learning

Victor S. Sheng (New York University, USA) and Charles X. Ling (The University of Western Ontario, Canada)
Copyright: © 2009 |Pages: 7
DOI: 10.4018/978-1-60566-010-3.ch054
OnDemand PDF Download:


Classification is the most important task in inductive learning and machine learning. A classifier can be trained from a set of training examples with class labels, and can be used to predict the class labels of new examples. The class label is usually discrete and finite. Many effective classification algorithms have been developed, such as naïve Bayes, decision trees, neural networks, and so on. However, most original classification algorithms pursue to minimize the error rate: the percentage of the incorrect prediction of class labels. They ignore the difference between types of misclassification errors. In particular, they implicitly assume that all misclassification errors cost equally. In many real-world applications, this assumption is not true. The differences between different misclassification errors can be quite large. For example, in medical diagnosis of a certain cancer, if the cancer is regarded as the positive class, and non-cancer (healthy) as negative, then missing a cancer (the patient is actually positive but is classified as negative; thus it is also called “false negative”) is much more serious (thus expensive) than the false-positive error. The patient could lose his/her life because of the delay in the correct diagnosis and treatment. Similarly, if carrying a bomb is positive, then it is much more expensive to miss a terrorist who carries a bomb to a flight than searching an innocent person.
Chapter Preview


Cost-sensitive learning takes costs, such as the misclassification cost, into consideration. It is one of the most active and important research areas in machine learning, and it plays an important role in real-world data mining applications. A comprehensive survey (Turney, 2000) lists a large variety of different types of costs in data mining and machine learning, including misclassification costs, data acquisition cost (instance costs and attribute costs), active learning costs, computation cost, human-computer interaction cost, and so on. The misclassification cost is singled out as the most important cost, and it has also been mostly studied in recent years (e.g., (Domingos, 1999; Elkan, 2001; Zadrozny & Elkan, 2001; Zadrozny et al., 2003; Ting 1998; Drummond & Holte, 2000, 2003; Turney, 1995; Ling et al, 2004, 2006b; Chai et al., 2004; Sheng & Ling, 2006)).

Broadly speaking, cost-sensitive learning can be categorized into two categories. The first one is to design classifiers that are cost-sensitive in themselves. We call them the direct method. Examples of direct cost-sensitive learning are ICET (Turney, 1995) and cost-sensitive decision tree (Drummond & Holte, 2000, 2003; Ling et al, 2004, 2006b). The other category is to design a “wrapper” that converts any existing cost-insensitive (or cost-blind) classifiers into cost-sensitive ones. The wrapper method is also called cost-sensitive meta-learning method, and it can be further categorized into thresholding and sampling. Here is a hierarchy of the cost-sensitive learning and some typical methods. This paper will focus on cost-sensitive meta-learning that considers the misclassification cost only.

Cost-sensitive learning:

  • Direct methods

    • o

      ICET (Turney, 1995)

    • o

      Cost-sensitive decision trees (Drummond & Holte, 2003; Ling et al, 2004, 2006b)

  • Meta-learning

    • o


  • MetaCost (Domingos, 1999)

  • CostSensitiveClassifier (CSC in short) (Witten & Frank, 2005)

  • Cost-sensitive naïve Bayes (Chai et al., 2004)

  • Empirical Threshold Adjusting (ETA in short) (Sheng & Ling, 2006)

    • o


  • Costing (Zadronzny et al., 2003)

  • Weighting (Ting, 1998)

Complete Chapter List

Search this Book: