This chapter introduces an advanced content-based image retrieval (CBIR) system, MMIR, where Markov model mediator (MMM) and multiple instance learning (MIL) techniques are integrated seamlessly and act coherently as a hierarchical learning engine to boost both the retrieval accuracy and efficiency. It is well-understood that the major bottleneck of CBIR systems is the large semantic gap between the low-level image features and the high-level semantic concepts. In addition, the perception subjectivity problem also challenges a CBIR system. To address these issues and challenges, the proposed MMIR system utilizes the MMM mechanism to direct the focus on the image level analysis together with the MIL technique (with the neural network technique as its core) to real-time capture and learn the object-level semantic concepts with some help of the user feedbacks. In addition, from a long-term learning perspective, the user feedback logs are explored by MMM to speed up the learning process and to increase the retrieval accuracy for a query. The comparative studies on a large set of real-world images demonstrate the promising performance of our proposed MMIR system.
Content-based image retrieval (CBIR), which was proposed in the early 1990s, has attracted a broad range of research interests from many computer communities in the past decade. Generally speaking, in a CBIR system, each image is first mapped to a point in a certain feature space, where the features can be categorized into color (Stehling, Nascimento, & Falcao, 2000), texture (Kaplan et al., 1998), shape (Zhang & Lu, 2002), and so forth. Next, given a query in terms of image examples, the system retrieves images with regard to their features (He, Li, Zhang, Tong, & Zhang, 2004). Though extensive research efforts have been directed into this area, it still remains a big challenge and an open issue in terms of retrieving the desired images from the large image repositories effectively and efficiently. In short, some of the major obstacles can be summarized as follows.
First, it is widely accepted that the major bottleneck of CBIR systems is the large semantic gap between the low-level image features and high-level semantic concepts, which prevents the systems from being applied to real applications (Hoi & Lyu, 2004).
Second, the perception subjectivity problem poses additional challenges for CBIR systems. In other words, in viewing the same image (e.g., Figure 1a), different users might possess various interests in either a certain object (e.g., the house, the tree, etc.) or the entire image (e.g., a landscape during the autumn season). In this case, Figure 1b, Figure 1c, or Figure 1d, respectively, might be considered as the relevant image with regard to Figure 1a. In addition, even a same user can have different perceptions toward the same image at various situations and with different purposes.
To address the earlier-mentioned challenges and issues, a certain form of adaptive (i.e., data-driven) description is required to capture the salient meaning of each image. In addition, the system should be able to expedite the navigation process through a large image database with the facilitation of users’ relevance feedbacks. In other words, the search engine should be equipped with an inference engine to observe and learn from user interactions. To this extent, we believe that there are both a need and an opportunity to systematically incorporate machine learning techniques into an integrated approach for content-based image retrieval. In this chapter, we introduce an advanced content-based image retrieval system called MMIR, where Markov model mediator (MMM) and multiple instance learning (MIL) techniques are integrated seamlessly and act coherently as a hierarchical learning engine to boost both the retrieval accuracy and efficiency.
Markov model mediator (MMM) is a statistical reasoning mechanism, which adopts the mathematically sound Markov model and the concept of mediators. As presented in our earlier studies (Shyu, Chen, Chen, Zhang, & Shu, 2003; Shyu, Chen, & Rubin, 2004a), MMM possesses the extraordinary capability in exploring the semantic concepts in the image level from the long-term learning perspective. In contrast, multiple instance learning (MIL) incorporated with the neural network (NN) technique aims at learning the region of interests based on the users’ relevance feedbacks on the whole image in real time. Integrating the essential functionalities from both MMM and MIL has the potential in constructing a robust CBIR system, which is the attempt of this study.
The remainder of this chapter is organized as follows. The next section, Background and Related Work, gives a broad background introduction as well as the literature review. The system is detailed in the Hierarchical Learning Scheme section and the Experimental Results section, followed by the discussions of the possible future trends in terms of the CBIR research in the Future Trends section. Finally, the chapter ends with the Conclusions section.