Multi-Modal Content Based Image Retrieval in Healthcare: Current Applications and Future Challenges

Multi-Modal Content Based Image Retrieval in Healthcare: Current Applications and Future Challenges

Jinman Kim (University of Sydney, Australia), Ashnil Kumar (University of Sydney, Australia), Tom Weidong Cai (University of Sydney, Australia) and David Dagan Feng (University of Sydney, Australia & Hong Kong Polytechnic University, Hong Kong)
Copyright: © 2011 |Pages: 16
DOI: 10.4018/978-1-60960-780-7.ch003
OnDemand PDF Download:
No Current Special Offers


Multi-modal imaging requires innovations in algorithms and methodologies in all areas of CBIR, including feature extraction and representation, indexing, similarity measurement, grouping of similar retrieval results, as well as user interaction. In this chapter, we will discuss the rise of multi-modal imaging in clinical practice. We will summarize some of our pioneering CBIR achievements working with these data, exemplified by a specific application domain of PET-CT. We will also discuss the future challenges in this significantly important emerging area.
Chapter Preview


Content-based image retrieval (CBIR) refers to the use of the visual attributes of images for searching an image database. In recent years, we have witnessed a rapid rise in CBIR research and the development of CBIR based clinical applications for medical image databases (Müller, 2004; Cai, 2007; Deserno, 2007; Long, 2009; Kim, 2009). Some well-known CBIR investigations include the retrieval of high-resolution lung computed tomography (CT) introduced by Shyu (1999); a study by El-Naqa (2004) for the retrieval of microcalcification types from mammography images; the retrieval of dynamic positron emission tomography (PET) images based on temporal attributes (Cai, 2000; Kim, 2006); and more recently, a retrieval system for spine X-ray images using a partial shape matching approach (Xu, 2008).

The aforementioned CBIR systems were designed for a single type of imaging modality, and were thus able to utilize domain specific knowledge and image processing optimizations. Such approaches, however, may be limited in their application when applied to different imaging modalities. There are several CBIR studies that are not bound to a single modality and that aim at supporting a diverse range of medical images. For example, in Lehmann (2005), an automatic categorization for a wide variety of medical images was presented that allowed for a robust classification of medical images. Their results demonstrated that their categorization technique, which based on global image textural features and scaling, was successful in classifying images according to their anatomical regions, imaging modality and specific orientation. The introduction of ImageCLEFmed, a medical section of the Cross Language Evaluation Forum (CLEF), has led to increasing interest in benchmarking the automatic classification and information retrieval from diverse medical image modalities (Deselaers, 2009; Rahman, 2007). ImageCLEFmed has created a standard environment for the evaluation and improvement of medical CBIR from heterogeneous collections containing images as well as text information.

However, regardless of their ability to retrieve from multiple modality databases, current retrieval technologies are inherently designed for single-modal images. Thus, these algorithms and systems are limited when applied to multi-modal images, as they do not fully utilize the additional complementary information that may be derived from these images. In this chapter, we refer to multi-modal images as two or more medical image modalities that are co-aligned to each other. These separate modalities may be co-aligned through sequential or simultaneous acquisition by a hybrid scanner or via image processing (see “Multi-modal Biomedical Imaging” for more details). Significant clinical benefits have arisen from the use of these multi-modality images and this has led to rapid acceptance of these images in clinical practice (Schulthess, 2009; Townsend, 2004). For example, the recently invented hybrid scanner that combines PET and magnetic resonance imaging (MRI) in a single scan (Beyer, 2009), enables the visualization of the functional abnormalities from PET (e.g. tumours) in relation to its co-aligned anatomical counterpart from MRI (soft and hard tissues) for the first time. These multi-modal images introduce new challenges and opportunities for CBIR research and development.

Apart from medical imaging, there has been great interest in multi-modal retrieval in consumer, public safety and professional applications (Kankanhalli, 2008). In these multimedia information retrieval (MIR) approaches, large array of modalities e.g. video (i.e. surveillance), text, signals and sound (i.e. voice recognition), in addition to image modalities (i.e. satellite), are combined for information fusion which are then used for information retrieval. The most common approach to multi-modal information fusion is by combing the semantic information that is derived from text to complement and improve the image features that are automatically extracted. Such combination has shown success in enhancing the image representation for retrieval (Zhang, 2005; Fu, 2008). In Kumar (2010), object detection in dynamic environment was proposed where several complementary modalities like visible spectrum and thermal infrared video are fused using evidence theory. Such multi-modal techniques share many complementary techniques with multi-modal medical CBIR and their combination may lead to accelerate breakthrough in CBIR research.

Complete Chapter List

Search this Book: