With the rapid growth of Internet and multimedia systems, the use of visual information has increased enormously, such that indexing and retrieval techniques have become important. Historically, images are usually manually annotated with metadata such as captions or keywords (Chang & Hsu, 1992). Image retrieval is then performed by searching images with similar keywords. However, the keywords used may differ from one person to another. Also, many keywords can be used for describing the same image. Consequently, retrieval results are often inconsistent and unreliable. Due to these limitations, there is a growing interest in content-based image retrieval (CBIR). These techniques extract meaningful information or features from an image so that images can be classified and retrieved automatically based on their contents. Existing image retrieval systems such as QBIC and Virage extract the so-called low-level features such as color, texture and shape from an image in the spatial domain for indexing. Low-level features sometimes fail to represent high level semantic image features as they are subjective and depend greatly upon user preferences. To bridge the gap, a top-down retrieval approach involving high level knowledge can complement these low-level features. This articles deals with various aspects of CBIR. This includes bottom-up feature- based image retrieval in both the spatial and compressed domains, as well as top-down task-based image retrieval using prior knowledge.
Traditional text-based indexes for large image archives are time consuming to create. A domain expert is required to examine each image scene and describe its content using several keywords. The language-based descriptions, however, can never capture the visual content sufficiently because a description of the overall semantic content in an image does not include an enumeration of all the objects and their properties. Manual text-based annotation generally suffers from two major drawbacks: (i) content mismatch, and (ii) language mismatch. A content mismatch arises when the information that the domain expert ascertains from an image differs from the information that the user is interested in. When this occurs, little can be done to recover the missing annotations. On the other hand, a language mismatch occurs when the user and the domain expert use different languages or phrases to describe the same scene. To circumvent language mismatch, a strictly controlled set of formal vocabulary or ontology is needed, but this complicates the annotation and the query processes. In text-based image query, when the user does not specify the right keywords or phrases, the desired images cannot be retrieved without visually examining the entire archive.
In view of the deficiencies of text-based approach, major research effort has been spent on CBIR over the past 15 years. CBIR generally involves the application of computer vision techniques to search for certain images in large image databases. “Content-based” means that the search makes use of the contents of the images themselves, rather than relying on manually annotated texts.
From a user perspective, CBIR should involve image semantics. An ideal CBIR system would perform semantic retrievals like “find pictures of dogs” or even “find pictures of George Bush.” However, this type of open-ended query is very difficult for computers to perform because, for example, a dog’s appearance can vary significantly between species. Current CBIR systems therefore generally make use of low-level features like texture, color, and shape. However, biologically-inspired vision research generally suggests two processes in visual analysis: bottom-up image-based analysis and top-down task-related analysis (Navalpakkam & Itti, 2006). Bottom-up analysis consists of memoryless stimulus-centric factors such as low-level image features. Top-down analysis uses prior domain knowledge to influence bottom-up analysis. An effective image retrieval system should therefore combine both the low-level features as well as the high level knowledge so that images can be classified automatically according to their context and semantic meaning.
Key Terms in this Chapter
Content-Based Image Retrieval: This refers to an image retrieval scheme which searches and retrieves images by matching information that is extracted from the images themselves. The information can be color, texture, shape and high level features representing image semantics and structure
Image Retrieval System: A computer system for users to search images stored in a database.
Image Signature: This is the same as feature descriptors used for image annotation and indexing.
Compressed Domain Feature Analysis: This refers to the process of image signature extraction performed in the transform domain. Image features are extracted by analyzing the transform coefficients of the image without incurring a full decompression.
Relevance Feedback: This provides an interactive way for humans to refine the retrieval results. Users can indicate to the image retrieval system whether the retrieved results are “relevant,” “irrelevant” or “neutral.” Retrieval results are then refined iteratively.
High Level Semantics: This refers to the image context as perceived by humans. It is generally subjective in nature and greatly depends on user’s preferences.
Spatial Domain Feature Analysis: This refers to the process of image signature extraction performed in the spatial domain. Image features are extracted by analyzing the spatial domain image representation.
Feature Descriptors: A set of features that is used for image annotation and indexing. The features can be keywords, low-level features including color, texture, shape, and high level features describing image semantics and structure
Bottom-Up Image Analysis: This refers to the use of low-level features, such as high luminance/color contrast or unique orientation from its surrounding, to identify certain objects in an image
Top-Down Image Analysis: This refers to the use of high level semantics, such as viewer’s expectations of objects and image context, to analyze and annotate an image
Based Image Retrieval: This refers to an image retrieval scheme which searches and retrieves images by using metadata such as keywords. In this scheme, all images are annotated with certain keywords. Searching is then performed by matching these keywords