Content-Based Image Retrieval for Digital Mammography

Content-Based Image Retrieval for Digital Mammography

Issam El Naqa (Washington University School of Medicine, USA), Liyang Wei (Hologic, Inc, USA) and Yongyi Yang (Illinois Institute of Technology, USA)
DOI: 10.4018/978-1-61520-777-0.ch023


Content-based image retrieval (CBIR) is an emerging field for computerized detection and diagnosis of breast cancer lesions. The underlying principle of CBIR in mammography is to query mammogram databases for diagnostic information based on the content or extracted features of the images instead of their textual annotation. Potentially, this would provide the radiologist with archived examples that are similar to his/her current case. This chapter reviews recent advances in CBIR technology, discuss its expanding role in medical imaging and its particular application to mammography, provides working examples based on the authors’ experience for developing machine-learning methods for CBIR in mammography, and highlights the potential opportunities in this field for computer vision research and clinical decision-making.
Chapter Preview


Recent years have witnessed burgeoning interest in developing methods for automated image retrieval. This is driven largely by the rapid increase in the size of image collections in various disciplines ranging from industrial, medical, to military applications, and by the steady development of the Internet. There is an increasing demand to retrieve stored pictorial information from these database systems in an efficient manner. Traditionally, these images are retrieved based on some textual annotation. However, in many disciplines, such annotation is neither adequate for capturing the information embedded in the images nor does it provide interactive image understanding for the user because of the following reasons (Niblack, et al., 1993; Smeulders & Worring, 2000; Tagare, Jaffe, & Duncan, 1997):

  • The search is solely dependent on the initially stored keywords, and the semantics of knowledge is imprecise to reflect the content of the image.

  • Visual properties such as certain textures or geometric shapes are often difficult to describe by text. In addition, spatial information contained in the image data may not be easily expressible in conventional language.

  • There is no universally accepted vocabulary yet to describe image characteristics. In medical imaging, diagnostic inference is a continuously evolving lexicon.

Image retrieval is an evolution of traditional information technology that is designed to include and access visual media requests. The main objective in an image retrieval system is the effective “querying” of archived images that match the user’s request, where the key challenge is to develop algorithms for automated image-content recognition. This involves a great deal of image understanding and machine intelligence.

Content-based image retrieval (CBIR) has been developed as a visual-based approach to overcome some of the difficulties and problems associated with human perception subjectivity and annotation impreciseness. However, despite the significant developments over the past decade with respect to similarity measures, objective image interpretations, feature extraction, and semantic descriptors (Bustos, Keim, Saupe, & Schreck, 2007; Müller, Michoux, Bandon, & Geissbuhler, 2004), some fundamental difficulties still remain pertaining to CBIR applications. First, it is understood that similarity measures can vary with the different aspects of perceptual similarity between images; the selection of an appropriate similarity measure thus becomes problem-dependent. Secondly, the relation between the low-level visual features and the high-level human interpretation of similarity is not well defined when comparing two images; it is thus not exactly clear what features or combination of them are relevant for such judgment (Bhanu, Peng, & Qing, 1998; El Naqa, Yang, Galatsanos, Nishikawa, & Wernick, 2004). Finally, while the user may understand more about the query, the database system can only guess (possibly through interactive learning) what the user is looking for during the retrieval process. This is an indispensable challenge in information retrieval, where the correct answer may not always be clearly identified.

In Figure 1, we show a diagram to illustrate a typical scenario of image retrieval from mammography databases, where an archive is organized into mammogram images, which in turn is organized into indices (i.e., a data structure of selected image features) for rapid lookup. The user formulates his/her retrieval problem as an expression in the query language (e.g., by presenting the images of the current case as query). The query is then translated into the language of indices and matched against those in the database, and those images with matching indices are retrieved.

Figure 1.

Image retrieval framework for mammography

Complete Chapter List

Search this Book: