Supervised Learning in Absence of Accurate Class Labels: A Multi-Instance Learning Approach

Supervised Learning in Absence of Accurate Class Labels: A Multi-Instance Learning Approach

Ramasubramanian Sundararajan (GE Global Research, India), Hima Patel (GE Global Research, India) and Manisha Srivastava (GE Global Research, India)
Copyright: © 2017 |Pages: 14
DOI: 10.4018/978-1-5225-2498-4.ch010
OnDemand PDF Download:
List Price: $37.50


Traditionally supervised learning algorithms are built using labeled training data. Accurate labels are essential to guide the classifier towards an optimal separation between the classes. However, there are several real world scenarios where the class labels at an instance level may be unavailable or imprecise or difficult to obtain, or in situations where the problem is naturally posed as one of classifying instance groups. To tackle these challenges, we draw your attention towards Multi Instance Learning (MIL) algorithms where labels are available at a bag level rather than at an instance level. In this chapter, we motivate the need for MIL algorithms and describe an ensemble based method, wherein the members of the ensemble are lazy learning classifiers using the Citation Nearest Neighbour method. Diversity among the ensemble methods is achieved by optimizing their parameters using a multi-objective optimization method, with the objective being to maximize positive class accuracy and minimize false positive rate. We demonstrate results of the methodology on the standard Musk 1 dataset.
Chapter Preview

1. Introduction

Supervised learning algorithms usually have a set of input samples and corresponding labels associated with that data. The goal of building a classifier is then to find a suitable boundary that can predict correct labels on test or unseen data. A lot of research has been carried out to build robust supervised learning algorithms that can battle the challenges of nonlinear separations, class imbalances etc.

However, the implicit assumption is that there exists a set of labels for the training data. This assumption may sometimes be expensive or not practical in the real world. In this chapter, we would like to draw your attention towards a set of algorithms where labels are not available at an instance level but rather at a coarser level – “bag” level. A bag is nothing but a collection of instances or individual data points. A bag is labeled positive if it contains at least one positive instance (which may or may not be specifically identified), and negative otherwise. This class of problems is known as multi-instance learning (MIL) problems.

This setting is applicable in a number of problems where traditional two-class classifiers may face one or more of the following difficulties:

Complete Chapter List

Search this Book: