Semantic Classification and Annotation of Images

Semantic Classification and Annotation of Images

Yonghong Tian (Peking University, China), Shuqiang Jiang (Chinese Academy of Sciences, China), Tiejun Huang (Peking University, China) and Wen Gao (Peking University, China)
Copyright: © 2009 |Pages: 28
DOI: 10.4018/978-1-60566-188-9.ch015


With the rapid growth of image collections, content-based image retrieval (CBIR) has been an active area of research with notable recent progress. However, automatic image retrieval by semantics still remains a challenging problem. In this chapter, the authors will describe two promising techniques towards semantic image retrieval—semantic image classification and automatic image annotation. For each technique, four aspects are presented: task definition, image representation, computational models, and evaluation. Finally, they will give a brief discussion of their application in image retrieval.
Chapter Preview


With the advance of multimedia technology and growth of image collections, content-based image retrieval (CBIR) is therefore proposed, which finds images that have low-level visual features (e.g., color, texture, shape) similar to those of the query image. However, retrieving images via low-level features has proven unsatisfactory since low-level visual features cannot represent the high-level semantic content of images. To reduce the so-called semantic gap (Smeulders, Worring, Santini, Gupta, & Jain, 2000), a variety of techniques have been developed. In this chapter, we discuss promising techniques on two important aspects of CBIR — (a) semantic image classification, and (b) automatic image annotation. Each component plays an important role in the greater semantic understanding of images.

General speaking, the semantics of images can be categorized into four levels from the lowest to the highest (Wang, Li, & Wiederhold, 2001):

  • semantic types (e.g., landscape photograph, clip art),

  • object composition (e.g., a bike and a car parked on a beach, a sunset scene),

  • abstract semantics (e.g., people fighting, happy person, objectionable photograph), and

  • detailed semantics (e.g., a detailed description for a given picture).

In (Song, Wang, & Zhang, 2003), the image semantics can be further grouped into local semantic level and thematic level (or global semantic level). Currently, most works in image classification and annotation have been done in the first two levels. Good progress has been made at the first level, which corresponds closely to image’s physical attributes such as indoor vs. outdoor. The second level of semantics seems more difficult to extract than the first. In this chapter, we pay our primary interest to the second level to investigate the problems of image classification and annotation.

Semantic image classification aims to classify images into (semantically) meaningful categories. Potentially, the categorization enhances image retrieval by permitting semantically adaptive searching methods and by narrowing down the searching range in a database. Figure 1 shows several representative categories of images. Roughly speaking, two basic strategies can be found in the literature. The first uses low-level visual features, such as color, texture or shape. Then different classifiers are applied on these visual features (e.g., Szummer & Picard, 1998; Vailaya, Figueiredo, Jain, & Zhang, 2001; Wang, et al., 2001; Grauman & Darrell, 2005; Zhang, Berg, Maire, & Malik, 2006). The second strategy adds an intermediate representation between the low-level features and semantic concepts (e.g., Li & Perona, 2005; Wang, Zhang, & Li, 2006). In the context of Web images, additional features are also exploited by image classification algorithms, such as the text extracted from the page that contains an image (Cai, He, Li, Ma, & Wen, 2004) or the link information between images or pages (Tian, Huang, & Gao, 2005).

Figure 1.

Examples of images for semantic image categorization.


Complete Chapter List

Search this Book: