A Comparison of Revision Schemes for Cleaning Labeling Noise

A Comparison of Revision Schemes for Cleaning Labeling Noise

Chuck P. Lam (Lama Solutions LLC., USA) and David G. Stork (Ricoh Innovations, Inc., USA)
Copyright: © 2008 |Pages: 13
DOI: 10.4018/978-1-59904-528-3.ch013
OnDemand PDF Download:
$37.50

Abstract

Data quality is an important factor in building effective classifiers. One way to improve data quality is by cleaning labeling noise. Label cleaning can be divided into two stages. The first stage identifies samples with suspicious labels. The second stage processes the suspicious samples using some revision scheme. This chapter examines three such revision schemes: (1) removal of the suspicious samples, (2) automatic replacement of the suspicious labels to what the machine believes to be correct, and (3) escalation of the suspicious samples to a human supervisor for relabeling. Experimental and theoretical analyses show that only escalation is effective when the original labeling noise is very large or very small. Furthermore, for a wide range of situations, removal is better than automatic replacement.

Complete Chapter List

Search this Book:
Reset