Cross-Modal Semantic-Associative Labelling, Indexing and Retrieval of Multimodal Data

Cross-Modal Semantic-Associative Labelling, Indexing and Retrieval of Multimodal Data

Meng Zhu (University of Reading, UK) and Atta Badii (University of Reading, UK)
DOI: 10.4018/978-1-60960-821-7.ch012


Digitalised multimedia information today is typically represented in different modalities and distributed through various channels. The use of such a huge amount of data is highly dependent on effective and efficient cross-modal labelling, indexing and retrieval of multimodal information. In this Chapter, we mainly focus on the combining of the primary and collateral modalities of the information resource in an intelligent and effective way in order to provide better multimodal information understanding, classification, labelling and retrieval. Image and text are the two modalities we mainly talk about here. A novel framework for semantic-based collaterally cued image labelling had been proposed and implemented, aiming to automatically assign linguistic keywords to regions of interest in an image. A visual vocabulary was constructed based on manually labelled image segments. We use Euclidean distance and Gaussian distribution to map the low-level region-based image features to the high-level visual concepts defined in the visual vocabulary. Both the collateral content and context knowledge were extracted from the collateral textual modality to bias the mapping process. A semantic-based high-level image feature vector model was constructed based on the labelling results, and the performance of image retrieval using this feature vector model appears to outperform both content-based and text-based approaches in terms of its capability for combining both perceptual and conceptual similarity of the image content.
Chapter Preview

Why Image And Text?

Digitised information nowadays is typically represented in multiple modalities and distributed through various information channels. Massive volumes of multimedia data are being generated every day due to the advances in digital media technologies. Efficient access to such an amount of multimedia content largely relies on effective and intelligent multi-modal indexing and retrieval techniques. The notion of multimodal implies the use of at least two human sensory or perceptual experiences for receiving different representations of the same information (Anastopoulou et al., 2001). According to the information need, a distinction can always be made between the primary and collateral information modalities. For instance, the primary information modality of an image retrieval system is of course the image content, while all the metadata in other modalities that explicitly or implicitly relate to the image content, e.g. collateral texts such as file name, captions, title, URL etc, could be considered as the collateral modality. Despite the rapid development in multi-sensory techniques, information in different modalities acquired though various sensors needs to be intelligently fused and integrated, so as to transform the raw sensory data into a semantically meaningful form. This will facilitate the paradigm shift away from the old multimedia towards the new mulsemedia, i.e. multiple sensorial media.

Complete Chapter List

Search this Book: