Region-Based Graph Learning towards Large Scale Image Annotation

Region-Based Graph Learning towards Large Scale Image Annotation

Bao Bing-Kun (Institute of Automation, Chinese Academy of Sciences, China) and Yan Shuicheng (National University of Singapore, Singapore)
DOI: 10.4018/978-1-4666-1891-6.ch013
OnDemand PDF Download:
List Price: $37.50


Graph-based learning provides a useful approach for modeling data in image annotation problems. In this chapter, the authors introduce how to construct a region-based graph to annotate large scale multi-label images. It has been well recognized that analysis in semantic region level may greatly improve image annotation performance compared to that in whole image level. However, the region level approach increases the data scale to several orders of magnitude and lays down new challenges to most existing algorithms. To this end, each image is firstly encoded as a Bag-of-Regions based on multiple image segmentations. And then, all image regions are constructed into a large k-nearest-neighbor graph with efficient Locality Sensitive Hashing (LSH) method. At last, a sparse and region-aware image-based graph is fed into the multi-label extension of the Entropic graph regularized semi-supervised learning algorithm (Subramanya & Bilmes, 2009). In combination they naturally yield the capability in handling large-scale dataset. Extensive experiments on NUS-WIDE (260k images) and COREL-5k datasets well validate the effectiveness and efficiency of the framework for region-aware and scalable multi-label propagation.
Chapter Preview


With fast growing number of images on photo-sharing websites such as Flickr and Picasa, it is in urgent need to develop scalable multi-label propagation algorithms for image indexing, management and searching. Given that most of these uploaded images lack for users' annotations, one crucial task is to automatically annotate these images to facilitate subsequent image searching. Automatic annotation of images in large scale dataset at the semantic level is challenging, mainly due to difficulties on: 1) How to explore the relationships between regions and labels. Generally, labels are relative to image local regions instead of whole image. However, most of the existing algorithms associate labels to whole image with assumption that image similarity and label similarity are consistent. Our first task is to associate the labels with specific image regions, which are believed to be more accurate. 2) How to reveal the co-occurrence among labels. Identification of labels with high probability that existing along with each other, e.g. “cloud” and “sky,” will improve image annotation performance. This has been discussed in recent years (Qi, 2007; Chen, 2008, Liu, 2006), but not in large scale dataset due to extra computations required. Our second task is to reveal the label co-occurrence in large scale datasets without incurring additional computations. 3) How to explore the relationships among images. To explore such relationships, especially semantic similarity, kernel function is usually used to construct the similarity matrix, which encodes the underlying dependence structure between images. However, it is impractical to calculate the kernel function over a large scale dataset. Our third task is to efficiently explore the image relationships in large scale datasets. 4) How to propagate the labels from labeled images into unlabeled ones. In large scale dataset, it is intractable to obtain full labels for all images. But it is still feasible to label a small subset of images, which are often regarded as “seed images,” and propagate the labels from these seed images into unlabeled ones through semi-supervised learning algorithm. Our fourth task is to select an efficient semi-supervised learning algorithm among existing ones for label propagation. To address these four difficulties, we propose a framework of region-aware and scalable multi-label propagation in this chapter.

Most of existing annotation algorithms are based on the assumption that image similarity and label similarity are consistent. This assumption ignores that each label often only characterizes a local semantic region within an image while image similarity is generally calculated based on the whole image, as illustrated in Figure 1. One reasonable solution is to represent the image with semantic region-based features. Xu et al. (2004), Chen et al. (2006), Zhou et al. (2007) proposed to regard each image as a bag consisting of multiple manually segmented regions and predicted the label of each region by a multi-class bag classifier. In practice, the manual segmentation is very time-consuming while without human interaction automatic image segmentation algorithms are still far from satisfaction. To address this, Gu et al. (2009) proposed to process each image into a robust bag of overlaid regions, which are segmented in different scales, to explore rich cues. The Bag-of-Regions (BOR) representation can not only catch the most semantic regions, but also well suit most existing segmentation algorithms. Therefore, in our framework, we follow Gu's algorithm to represent every image as BOR and extract every region's feature for exploring relationships between semantic regions and labels.

Figure 1.

Exemplar images with multiple labels: the four images from Corel5k dataset (Yuan, 2007) and their segmented regions with multiple labels

Complete Chapter List

Search this Book: