Image Databases (IDBs) are a kind of Spatial Databases where a large number of images are stored and queried. In this chapter, techniques for indexing an IDB for efficiently processing several kinds of queries, like retrieval based on features, content, structure, processing of joins, and queries by example are reviewed. The main indexing techniques used in IDBs are either members of the R-tree family (data driven structures), or members of the quadtree family (space driven structures). Although, research on IDB indexing counts several years, there are still significant research challenges, which are also discussed in this chapter. IDBs and their indexing structures bring together two different disciplines (databases and image processing) and interdisciplinary research efforts are required. Moreover, dealing with the semantic gap (successful integrated retrieval based on low-level features and high-level semantic features) and querying between images and other kinds of spatial data are also significant future research directions.
Image Databases (IDBs) are a special kind of Spatial Databases where a large number of images are stored and queried. IDBs have a plethora of applications in modern life, for example in medical, multimedia, and educational applications. In the framework of Geographical Information Systems (GIS), digital images (raster data) may represent changes in cultivations, sunny areas, and the discrimination between urban environments and country sides.
Apart from the raster format, GIS data may be stored in vector format (points, line segments, polygons, etc.). Each of these data formats has certain advantages making a choice between them a challenge. Raster data leads to faster computing for several operations (e.g., overlays) and are well suited for remote sensing. On the other hand, they have a fixed resolution leading to limited detail. In this article, we focus on raster data (image databases) and their indexing techniques.
Since the start of the 1980s several structures for spatial objects have been proposed in the literature for efficient storage and retrieval of image collections. Based on these methods, many kinds of useful queries on image data may be processed efficiently. These include:
Queries about the content of additional properties (descriptive information) that have been embedded for each image (e.g., which images have been used in the book cover of children’s books?).
Queries about the characteristics/features of the images like color, texture, shape etc. (e.g., find the images that depict vivid blue sky.).
Queries for retrieving images with specified content (e.g., find the images that contain the sub-image of a specified chair.).
Queries by example or sketch (e.g., a sample image is chosen, or drawn by the user and images similar to this sample are sought.).
Structural queries (e.g., find the images that contain a number of specific objects in a specified arrangement.).
Image Joins (e.g., find the cultivation areas that reside in polluted atmosphere areas.).
Queries that combine regional data and other sorts of spatial data (e.g., find the cities represented by point data that reside within 5km from cotton cultivations.).
Temporal Queries on sequences of evolving images (e.g., find if there has been an increase in the regions of wheat cultivations in this prefecture during the last two years.).
The importance of image indexing and querying techniques led major Database Management Systems’ manufacturers to embed related extensions to the core engine of their products, (e.g., DB2 has embedded QBIC technology) (Flickner et al. 1995) and Oracle provides Content-Based Image Retrieval (CBIR) based on Virage (Annamalai et al. 2000).Top
A digital image is a representation of a two-dimensional image as a finite set of digital values, called picture elements or pixels. In a binary image, each pixel can be either black, or white, while in a greyscale (color) image each pixel corresponds to a shade of gray (to a color), among a set of permitted greyscale (color) values.
Each image represents a scene containing objects and regions. An IDB is an organized collection of digital images aiming at the management and the efficient processing of queries on this image collection. There are numerous publications in the literature related to the processing of queries on image features like color (e.g., distribution of colors, dominant colors, and color moments), texture (the pattern of the image surface change, usually expressed by a combination of characteristics like coarseness, contrast, directionality, uniformity, regularity, density, frequency, etc.) and shape (the physical structure of objects, or the geometric shapes present in the image). In several of these publications (emerging from the image processing/computer vision community) the term indexing refers to the features corresponding to each image and to the algorithm used for computing the similarity between them (the algorithm often works by an exhaustive comparison with all the images present in the databases). In this article, indexing is used in the context of databases and corresponds to the access methods (data structures) used to speed up query processing.
Key Terms in this Chapter
Color Features of an Image: Characteristics of an image related to the presence of color information, like distribution of colors, dominant colors, or color moments.
Access Method or Index Structure: A technique of organizing data that allows the efficient retrieval of data according to a set of search criteria.
Structural Features of an Image: The arrangement of the objects depicted in the image.
Content-Based Image Retrieval: Searching for images in image databases according to their visual contents, like searching for images with specific color, texture, or shape properties, for images containing specific objects, or containing objects in a specified arrangement.
Semantic Features of an Image: The contents of an image according to human perception, like the objects present in the image or the concepts / situations related to the image.
Texture Features of an Image: The pattern(s) of the image’s surface change, usually expressed by a combination of characteristics like coarseness, contrast, directionality, uniformity, regularity, density, and frequency.
Query processing: Extracting information from a large amount of data without actually changing the underlying database where the data are organized.
Image Database: An organized collection of digital images aimed at the efficient management and the processing of queries on this image collection
Similarity of Images: The degree of likeness between images according to a number of features, like color texture, shape, and semantic features.
Shape Features of an Image: The physical structure(s) of the objects, or the geometric shapes present in the image.