Multimedia data comprising of images, audio and video is becoming increasingly common. The decreasing costs of consumer electronic devices such as digital cameras and digital camcorders, along with the ease of transportation facilitated by the Internet, has lead to a phenomenal rise in the amount of multimedia data. Given that this trend of increased use of multimedia data is likely to accelerate, there is an urgent need for providing a clear means of capturing, storing, indexing, retrieving, analyzing and summarizing such data. Image data is a very commonly used multimedia data type. The early image retrieval systems are based on manually annotated descriptions, called text-based image retrieval (TBIR). TBIR is a great leap forward, but has several inherent drawbacks. First, textual description is not capable of capturing the visual contents of an image accurately and in many circumstances the textual annotations are not available. Second, different people may describe the content of an image in different ways, which limits the recall performance of textual-based image retrieval systems. Third, for some images there is something that no words can convey. To resolve these problems, content-based image retrieval (CBIR) became an active and fast developing research area from the early 1990s, and has attracted significant research attention. CBIR aims to search images that are perceptually similar to the query based on visual content of the images without help of annotations.
CBIR systems are designed to support image retrieval as well as storage and processing activities related to image data management in multimedia information systems. So CBIR systems are the key to implementing image data management. Image data management requires CBIR system support. It should be noted, however, that not being the same as the traditional textual retrieval, a general CBIR framework contains several main components such as feature extraction and representation, similarity measurement, databases of pre-analyzed image collections, and relevance feedback. It has been shown that artificial intelligence (AI) plays an important role in the feature extraction, similarity measures, and relevance feedback of CBIR. CBIR using AI technology is emerging as a new discipline, which provides the mechanisms for retrieving image data efficiently and naturally by means of AI technology. Many researchers have being concentrated on CBIR using AI technology. The research and development of CBIR using AI technology are receiving increasing attention. By means of AI technology, large volumes of image data can be retrieved effectively and naturally from image databases. Intelligent CBIR systems are hereby built based on AI and databases to support various problem solving and decision making. So intelligent CBIR system is a field that must be investigated by academic researchers together with developers both from CBIR and AI fields.
The book focuses on the following issues of AI for CBIR: AI for the feature extraction and representation, AI for the distance measurement and image indexing as well as query, AI for the relevance feedback, and the intelligent CBIR systems and applications, aiming at providing a single account of technologies and practices in AI for CBIR. The objective of the book is to provide the state of the art information to academics, researchers and industry practitioners who are involved or interested in the study, use, design and development of advanced and emerging AI technologies for CBIR with ultimate aim to empower individuals and organizations in building competencies for exploiting the opportunities of the knowledge society. This book presents the latest research and application results in AI for CBIR. The different chapters in the book have been contributed by different authors and provide possible solutions for the different types of technological problems concerning AI for CBIR.
This book which consists of eleven chapters is organized into four major sections. The first section discusses the issues of AI for the feature extraction and representation in the first four chapters. The next four chapters covering AI for the distance measurement and image indexing as well as query comprise the second section. The third section includes four chapters about AI for the relevance feedback. The fourth section containing the final four chapters focuses on the intelligent CBIR systems and applications.
First of all, we take a look at the issues of AI for the feature extraction and representation.
Danilo Avola, Fernando Ferri and Patrizia Grifoni introduce and discuss the different Artificial Intelligent (AI) approaches used to extract and to represent the features of any image. In particular, the role of the Genetic Algorithms (GAs) has been highlighted. They start from a brief discussion about the feature extraction process, and then introduce a general description of some of the most interesting AI approaches and their application in image feature extraction problems. They give a more complete and exhaustive description of the GAs. Finally the possibility of combined AI approaches (used in the hybrid systems) to solve more complex feature extraction problems is faced. Therefore, they present some of the most recent and powerful applications exploiting the AI image feature extraction.
Dany Gebara and Reda Alhajj present a novel approach for content-based image retrieval and demonstrate its applicability on non-texture images. The process starts by extracting a feature vector for each image; wavelets are employed in the process. Then the images (each represented by its feature vector) are classified into groups by employing a density based clustering approach, namely OPTICS. This highly improves the querying facility by limiting the search space to a single cluster instead of the whole database. The cluster to be search is determined by applying on the query image the same clustering process OPTICS; this leads to the closest cluster to the query image and hence limits the search to the latter cluster, without adding the query image to the cluster expect if such request is explicitly specified.
Texture feature extraction and description is one of important research contents in content-based medical image retrieval. Gang Zhang et al. first propose a framework of content-based medical image retrieval system. Then they review the important texture feature extraction and description methods such as co-occurrence matrix, perceptual texture features, Gabor wavelet, and etc. Moreover, they analyze each of the improved methods and demonstrate its application in content-based medical image retrieval.
Jafar M. Ali presents an application of rough sets to feature reduction, classification, and retrieval for image databases in the framework of content-based image retrieval systems. The suggested approach combines image texture features with color features to form a powerful discriminating feature vector for each image. Texture features are extracted, represented, and normalized in an attribute vector, followed by a generation of rough set dependency rules from the real value attribute vector. The rough set reduction technique is applied to find all reducts with the minimal subset of attributes associated with a class label for classification.
The next section takes a look at AI for the distance measurement and image indexing as well as query.
Many different solutions have been proposed to improve performance of content based image retrieval, but the large part of these works have focused on sub-parts of the retrieval problem, providing targeted solutions only for individual aspects (i.e., feature extraction, similarity measures, indexing, etc). David García Pérez et al. first shortly review some of the main practiced solutions for content based image retrieval evidencing some of the main issues. Then, they propose an original approach for the extraction of relevant image objects and their matching for retrieval applications, and present a complete image retrieval system which uses this approach (including similarity measures and image indexing). In particular, image objects are represented by a two dimensional deformable structure, referred to as “active net” capable to adapt to relevant image regions according to chromatic and edge information. Extension of the active nets has been defined which permits the nets to break themselves, thus increasing their capability to adapt to objects with complex topological structure. The resulting representation allows a joint description of color, shape and structural information of extracted objects. A similarity measure between active nets has been also defined and used to combine the retrieval with an efficient indexing structure.
CBIR aims to search images that are perceptually similar to the query based on visual content of the images without help of annotations. The current CBIR systems use global features (e.g. color, texture, shape) as image descriptors or use features extracted from segmented regions (called region-based descriptors). In the former case, descriptors are not discriminative enough at the object level and are sensitive to object occlusion or background clutter, thus fail to give satisfactory result. In the latter case, the features are sensitive to the image segmentation, which is a difficult task in its own right. In addition, the region-based descriptors are still not invariant to varying imaging conditions. Ming Zhang and Reda Alhajj look at the CBIR from the object detection/recognition point of view and introduce the local feature-based image representation methods recently developed in object detection/recognition area. These local descriptors are highly distinctive and robust to imaging condition change. In addition to image representation, they also introduce the other two key issues of CBIR: similarity measurement for image descriptor comparison and the index structure for similarity search.
Chotirat “Ann” Ratanamahatana, Eamonn Keogh and Vit Niennattrakul demonstrate how multimedia data can be reduced to a more compact form, i.e., time series representation, while preserving the features of interest, and can then be efficiently exploited in Content-Based Image Retrieval. They introduce a general framework that learns a distance measure with arbitrary constraints on the warping path of the Dynamic Time Warping calculation. They demonstrate utilities of their approach on both classification and query retrieval tasks for time series and other types of multimedia data including images, video frames, and handwriting archives. In addition, they show that incorporating the framework into the relevance feedback system, a query refinement can be used to further improve the precision/recall by a wide margin.
Traditional index structures are based on trees and use the k-Nearest Neighbors (k-NN) approach to retrieve databases. Due to some disadvantages of such an approach, the use of neighborhood graphs has been proposed. While this approach is interesting, it suffers from some disadvantages, mainly in its complexity. Hakim Hacid and Abdelkader Djamel Zighed present a step in a long process of analyzing, structuring, and retrieving multimedia databases. They propose an effective method for locally updating neighborhood graphs, which constitute their multimedia index. Then, they exploit this structure in order to make the retrieval process easy and effective, using queries in an image form in one hand. In another hand, they use the indexing structure to annotate images in order to describe the semantics of images.
The third section deals with the issues of AI for the relevance feedback.
Ruofei Zhang and Zhongfei (Mark) Zhang take the user relevance feedback in image retrieval as a standard two-class pattern classification problem aiming at refining the retrieval precision by learning through the user relevance feedback data. They investigate this problem by noting two important unique characteristics of the problem: small sample collection and asymmetric sample distributions between positive and negative samples. They develop a novel approach to empirical Bayesian learning to solve for this problem by explicitly exploiting the two unique characteristics, which is the methodology of BAyesian Learning in Asymmetric and Small (BALAS) sample collections. In BALAS different learning strategies are used for positive and negative sample collections, respectively, based on the two unique characteristics. By defining the relevancy confidence as the relevant posterior probability, they develop an integrated ranking scheme in BALAS which complementarily combines the subjective relevancy confidence and the objective similarity measure to capture the overall retrieval semantics.
An image is a symbolic representation; people interpret an image and associate semantics with it based on their subjective perceptions, which involves the user’s knowledge, cultural background, personal feelings and so on. Content-based image retrieval (CBIR) systems must be able to interact with users and discover the current user’s information needs. An interactive search paradigm that has been developed for image retrieval is machine learning with a user-in-the-loop, guided by relevance feedback, which refers to the notion of relevance of the individual image based on the current user’s subjective judgment. Relevance feedback serves as an information carrier to convey the user's information needs/preferences to the retrieval system. Chia-Hung Wei and Chang-Tsun Li provide the fundamentals of CBIR systems and relevance feedback for understanding and incorporating relevance feedback into CBIR systems. Also they discuss several approaches to analyzing and learning relevance feedback.
Paweł Rotter and Andrzej M. J. Skulimowski describe two new approaches to content-based image retrieval (CBIR) based on preference information provided by the user interacting with an image search system. First, they present the existing methods of image retrieval with relevance feedback, which serve then as a reference for the new approaches. The first extension of the distance function-based CBIR approach makes it possible to apply this approach to complex objects. The new algorithm is based on an approximation of user preferences by a neural network. Further, they propose another approach to image retrieval, which uses reference sets to facilitate image comparisons. The methods proposed have been implemented, and compared with each other, and with the earlier approaches. Finally, they provide a real-life illustration of the methods proposed: an image-based hotel selection procedure.
Relevance feedback (RF) learning has been proposed as a technique aimed at reducing the semantic gap. By providing an image similarity measure under human perception, RF learning can be seen as a form of supervised learning that finds relations between high-level semantic interpretations and low-level visual properties. That is, the feedback obtained within a single query session is used to personalize the retrieval strategy and thus enhance retrieval performance. Iker Gondra presents an overview of CBIR and related work on RF learning. He also presents his own previous work on a RF learning-based probabilistic region relevance learning algorithm for automatically estimating the importance of each region in an image based on the user’s semantic intent.
In the fourth section, we see the intelligent CBIR systems and applications.
Semantics-based retrieval is a trend of the Content-based Multimedia Retrieval. Typically, in multimedia databases, there exist two kinds of clues for query: perceptive features and semantic classes. Zhiping Shi et al. propose a framework for multimedia database organization and retrieval integrating the perceptive features and semantic classes. Thereinto, a semantics supervised cluster based index organization approach (briefly as SSCI) is developed: the entire data set is divided hierarchically into many clusters until the objects within a cluster are not only close in the perceptive feature space but also within the same semantic class, then an index entry is built for each cluster. Especially, the perceptive feature vectors in a cluster are organized adjacently in disk. Furthermore, the SSCI supports a relevance feedback approach: users sign the positive and negative examples regarded a cluster as unit rather than a single object.
As distributed mammogram databases at hospitals and breast screening centers are connected together through PACS, a mammogram retrieval system is needed to help medical professionals locate the mammograms they want to aid in medical diagnosis. Chia-Hung Wei, Chang-Tsun Li and Yue Li present a complete content-based mammogram retrieval system, seeking images that are pathologically similar to a given example. In the mammogram retrieval system, the pathological characteristics that have been defined in Breast Imaging Reporting and Data System (BI-RADS) are used as criteria to measure the similarity of the mammograms. A detailed description of those mammographic features is provided. Since the user’s subjective perception should be taken into account in the image retrieval task, a relevance feedback function is also developed to learn individual users’ knowledge to improve the system performance.
Video surveillance automation is used in two key modes: watching for known threats in real-time and searching for events of interest after the fact. Typically, real-time alerting is a localized function, e.g. an airport security center receives and reacts to a “perimeter breach alert,” while investigations often tend to encompass a large number of geographically distributed cameras like the London bombing, or Washington sniper incidents. Enabling effective event detection, query and retrieval of surveillance video for preemption and investigation involves indexing the video along multiple dimensions. Ying-li Tian et al. present a framework for event detection and surveillance search that includes: video parsing, indexing, query and retrieval mechanisms. It explores video parsing techniques that automatically extract index data from video, indexing which stores data in relational tables, retrieval which uses SQL queries to retrieve events of interest and the software architecture that integrates these technologies.
Min Chen and Shu-Ching Chen introduce an advanced content-based image retrieval (CBIR) system, MMIR, where Markov model mediator (MMM) and multiple instance learning (MIL) techniques are integrated seamlessly and act coherently as a hierarchical learning engine to boost both the retrieval accuracy and efficiency. The proposed MMIR system utilizes the MMM mechanism to direct the focus on the image level analysis together with the MIL technique (with the neural network technique as its core) to real-time capture and learn the object-level semantic concepts with some help of the user feedbacks. In addition, from a long-term learning perspective, the user feedback logs are explored by MMM to speed up the learning process and to increase the retrieval accuracy for a query.