Textual-Shape-Based Image Retrieval

Textual-Shape-Based Image Retrieval

DOI: 10.4018/978-1-5225-3796-0.ch006


In this chapter, a method to combine both text and image feature is considered. The FOS is explained in Chapter 3 is combined with textual information extracted (as discussed in Chapter 1). A clustering mechanism is formulated based on image, text and both. A retrieval is presented as an example to demonstrate the functionality by which the reader can understand the use of combining both textual keywords and FOS. The Chapter has consolidated the performance of combined feature using Precision, Recall and F1-score. The performance is evaluated and compared with well-known Google retrieved system.
Chapter Preview


It has been observed that low-level features can be efficiently used to perform retrieval in domain-specific applications. Color histograms like Human Color Perception Histogram (HCPH) (Vadivel, Shamik & Majumdhar, 2008), (Deng 2001) & (Gevers & Stokman, 2004) and color-texture features like Integrated Color and Intensity Co-occurrence Matrix (ICICM) (Vadivel, Shamik & Majumdhar, 2007) and Fuzzy Object Shape (FOS) (Shanmugavadivu et al, 2015 & 2016) features show high precision of retrieval in such applications. However, in more generic applications, low-level features cannot always represent semantic content of images effectively. As a result, retrieval precision tends to drop. It is observed that to use keywords to restrict the search space in such applications. For example, while searching for images on the World Wide Web (WWW), use of keywords may be quite helpful in filtering out web pages that are not relevant. In early years, textual keywords were used to be assigned by domain experts to images on the web, which made the annotations highly subjective. A survey of image retrieval systems for images available on the Internet may be found in TASI. Out of these image search engines, Google (www.yahoo.com) is also simple. In addition, the performance of retrieval is quick, retrieved images are found to be relevant without dead links and duplicates.

Text in natural images is an important source of information, which can be utilized for many real-world applications, which focuses on a new problem say, distinguishing images that contain text from a large volume of natural images. To address this problem, multi-scale spatial partition network is proposed (Bai et al 2017). The network classifies images that contain text or not, by predicting text existence in all image blocks, which are spatial partitions at multiple scales on an input image. The whole image is classified as a text image as long as one of the blocks is predicted to contain text. The network classifies images very efficiently by predicting all blocks simultaneously in a single forward propagation.

An automatic image–text alignment algorithm is developed to achieve more effective indexing and retrieval of large-scale web images by aligning web images with their most relevant auxiliary text terms or phrases. (Zhou & Fan 2015). Initially, a large number of cross-media web pages are crawled and segmented into a set of image–text pairs the near-duplicate image clustering is used to group large-scale web images into a set of clusters of near-duplicate images according to their visual similarities. The near-duplicate web images in the same cluster share similar semantics and are simultaneously associated with a same or similar set of auxiliary text terms or phrases which co-occur frequently in the relevant text blocks, thus performing near-duplicate image clustering can significantly reduce the uncertainty on the relatedness between the semantics of web images and their auxiliary text terms or phrases. Finally, random walk is performed over a phrase correlation network to achieve more precise image–text alignment by refining the relevance scores between the web images and their auxiliary text terms or phrases.

Complete Chapter List

Search this Book: