Encoding and Indexing Fuzzy Object Shape for Image Retrieval

Encoding and Indexing Fuzzy Object Shape for Image Retrieval

DOI: 10.4018/978-1-5225-3796-0.ch004

Abstract

The recent retrieval and indexing approaches suffer from the issues of curse of dimensionality, overlapping of vectors, need of extra parameters for clustering and not supporting incremental indexing. In this chapter, an indexing approach is introduced without enduring the above specified issues. The suggested indexing structure is dynamically rearranged based on the occurrences of pattern for merging the common patterns in low-level feature. Further, the feature is encoded using GR coding scheme and the feature database space required is reduced considerably. While there are many encoding scheme available, in this chapter GR coding is used for simplicity and its applicability. In addition, the compression ratio is discussed and numerical statistics is depicted. Overall, indexing scheme reduces the search space by following predetermined pattern for clustering. The encoding scheme reduces the size of the feature database and also achieves good precision of retrieval.
Chapter Preview
Top

Introduction

Indexing is very important for image retrieval applications. It is imperative that a suitable indexing mechanism is required for effectively indexing low-level feature. The indexed feature can be combined with textual feature for facilitating shape based retrieval. Broadly, there are two categories of indexing schemes namely vector quantization and multidimensional indexing. The multidimensional indexing is further divided into space-partitioning and data-partitioning methods. These methods represent the data in a hierarchical tree by splitting the data space progressively into smaller parts. The difference between these approaches is the way the data or space is divided. In space-partitioning, the KD-tree (K-Dimensional tree) splits the feature space into predefined hyper-planes, irrespective of the feature vector distribution. Such areas are disjoint and their combination covers the whole space. The main problem of KD-tree is that it is difficult to recognize the position of the feature vector using each level of the tree. Further, fixed partitioning of space can result to an empty or few populated cluster particularly in high dimensions. The R-tree, SS-tree and Sphere/Rectangle-tree are the most popular data partitioning approaches. The database items are distributed to partition the feature space. The R-tree represents each node in the tree as hyper-rectangle by partitioning the space. The space is again partitioned into smaller for representing the child nodes. The performance of R-tree is encouraging compared to KD-tree and it is found that the bounding rectangles are overlapping. It has outperformed SS-tree, where the minimum bounding spheres are used. However, the overlapping of higher dimensions is not fully rectified in the SS–tree also. The SR-tree is an enhancement of the R-tree and SS-tree, where the intersection of hyper-spheres and hyper-rectangles are performed to decide the shape of a partition. Due to the curse of dimensionality, the multidimensional indexing does not scale up well to high dimensional spaces.

In addition, the partitions increase exponentially with the dimension. However, these multidimensional indexing structures are mostly useful for medium dimensional feature spaces. The Principal Component Analysis (PCA) and Latent Semantic Indexing (LSI) have avoided the curse of dimensionality problem and reduced the dimension of feature vectors considerably for retrieval applications. The original feature vectors are considered as approximate of low dimensional transformed feature vectors. However, the most serious problem with these methods is that there is a loss of significant information due to reduction in the dimension of the feature vector. The distance computation is another multidimensional indexing based approach and it is observed that this approach is costly and CPU intensive, especially for high dimensional data spaces. Moreover, while a query point is located near a partition border of a nearest neighbor query algorithm, the performance degrades. This is due to the fact that it is necessary to take two decisions. For decreasing the retrieval time, the neighboring partitions are avoided or otherwise consider the neighboring partitions with high computational requirements.

Complete Chapter List

Search this Book:
Reset