Multi-Modal Fusion Schemes for Image Retrieval Systems to Bridge the Semantic Gap

Multi-Modal Fusion Schemes for Image Retrieval Systems to Bridge the Semantic Gap

Nidhi Goel (University of Delhi, India) and Priti Sehgal (University of Delhi, India)
DOI: 10.4018/978-1-4666-9685-3.ch007
OnDemand PDF Download:


Image retrieval (IR) systems are used for searching of images by means of diverse modes such as text, sample image, or both. They suffer with the problem of semantic gap which is the mismatch between the user requirement and the capabilities of the IR system. The image data is generally stored in the form of statistics of the value of the pixels which has very little to do with the semantic interpretation of the image. Therefore, it is necessary to understand the mapping between the two modalities i.e. content and context. Research indicates that the combination of the two can be a worthwhile approach to improve the quality of image search results. Hence, multimodal retrieval (MMR) is an expected way of searching which attracts substantial research consideration. The main challenges include discriminatory feature extraction and selection, redundancy identification and elimination, information preserving fusion and computational complexity. Based on these challenges, in this chapter, authors focus on comparison of various MMR systems that have been used to improve the retrieval results.
Chapter Preview


We are living in the age of information where the amount of accessible data from science and culture is almost limitless. Multimedia is one of the most interesting and exciting aspects of this information era (Guan et al., 2010). By name, it represents a combination of information content from different media sources in various forms. Examples are audio, video, image, and text each of which can be considered as a modality in multimodal multimedia representation. Development in the data storage media and acquisition techniques has led to the availability of huge amount of multimedia information/data, from medical domain to web to personal data collection. However, finding an item of interest is increasingly difficult. In area of search, the greatest societal impact has been in WWW image search engines and recommendation systems. Google, Yahoo!, and Bing are the image search engines used by millions of people daily. Recommendation systems such as Amazon (Linden, Smith, & York, 2003), napster (Napster, 2001) recommend from books to clothing, movies to music based on priorities selected by user. Another worthy example is Getty images (Machin, 2004) where user is assumed knowledgeable which is reflected by image search engine through multimodal interactive search by content, context, style, composition, and user feedback. All the systems mentioned above fall under one category i.e. image retrieval systems.

So, what is an image retrieval system? Image retrieval (IR) system is a computerized system for browsing, searching, and retrieving images from a large database of digital images. Apart from personal albums (e.g., Flickr, photo bucket, Picasa web album and and general-purpose image collections (e.g., Google Images and Yahoo! Images), IR systems are used in various applications like face matching, fingerprint matching in biometrics, X-rays and tumors matching in medical applications, tattoo and scar matching in crime detection, scene matching in surveillance, satellite image matching in GIS and remote sensing, sketch matching in archeological, art and fashion application, disease detection in crops, food quality evaluation, defect detection for machines in industry and the list goes on. As processing has now become increasingly powerful and memory has become cheaper, the deployment of large multimedia datasets for various applications has relatively become easier and efficient.

Till date, there have been various research techniques for indexing and searching the multimedia data (Datta, Joshi, Li, & Wang, 2008). Co-existence of multimodal information demands a retrieval system to search results across various multimedia objects (J. Liu, Xu, & Lu, 2010). Therefore, cross-media retrieval is an expected way of searching which has become increasingly important and attracts substantial research consideration. It enables users to query in more than one mode. Typically, for image retrieval systems, text and content are the two most common modes of query. For decades, image retrieval has evolved from text based (1980s) to content based (1990s) to fuzzy image retrieval (2004) (Singhai & Shandilya, 2010).

Complete Chapter List

Search this Book: