Computer vision or object recognition complements human or biological vision using techniques from machine learning, statistics, scene reconstruction, indexing and event analysis. Object recognition is an active research area that implements artificial vision in software and hardware. Some application examples are autonomous robots, surveillance, indexing databases of pictures and human computer interaction. This visual aid is beneficial to users, because humans remember information with greater accuracy when it is presented visually than when it originates in writing, speech or in kinesthetic form. Linguistic indexing adds another dimension to computer vision by automatically assigning words or textual descriptions to images. This augments content-based image retrieval (CBIR) that extracts or searches for digital images in large databases. According to Li and Wang (2003), most of the existing CBIR projects are general-purpose image retrieval systems that search images visually similar to a query sketch. Current CBIR systems are incapable of assigning words automatically to images due to the inherent difficulty of recognizing numerous objects at once. This current situation is stimulating several research endeavors that seek to assign text to images, thereby improving image retrieval in large databases. To enhance information processing using object recognition techniques, current research has focused on automatic linguistic indexing of digital images (ALIDI). ALIDI requires a combination of mathematical, statistical, computational, and graphical backgrounds. Many researchers have focused on various aspects of linguistic processing such as CBIR (Ghosal, Ircing, & Khudanpur, 2005; Iqbal & Aggarwal, 2002, Wang, 2001) machine learning techniques (Iqbal & Aggarwal, 2002), digital library (Witen & Bainbridge, 2003) and statistical modeling (Li, Gray, & Olsen, 20004, Li & Wang, 2003). A growing approach is the utilization of statistical models as demonstrated by Li and Wang (2003). It entails building databases of images to be used for supervised learning. A trained system is used to recognize and identify new images with statistical error margin. This statistical modeling approach uses a hidden Markov model to extract representative information about any category of images analyzed. However, in using computer to recognize images with textual description, some of the researchers employ solely text-based approaches. In this article, the focus is on the computational and graphical aspects of ALIDI in a system that uses Web-based access in order to enable wider usage (Ntoulas, Chao, & Cho, 2005). This system uses image composition (primary hue and saturation) in the linguistic indexing of digital images or pictures.
Current image indexing systems are text-based, relying on content-relevant text placed in proximity to images. There is need for Web-based automated linguistic indexing for digital images. This fact will likely accelerate the adoption of automated linguistic indexing for images in their native visual form, which basically assigns textual description automatically to images (Forsyth & Ponce, 2002; Li, Gray, & Olsen, 2000; Li & Wang, 2003). ALIDI is currently an active research area in data mining, and its application is growing in such fields as consumer photo managers, medical imaging databases and image search engines (Berman & Shapiro, 1997; Li & Wang, 2003; Tanev, Kouylekov, & Magnini, 2004; Zhang, Goldman, Yu, & Fritts, 2002).
Key Terms in this Chapter
Linguistic Indexing: Assignment of textual description or words to images or pictures as a way of identifying the image.
Image Composition: General makeup or the proportion of elements in an image, for example, color.
Machine Learning: An area of artificial intelligence that allows computers to apply rules and algorithms in a learning process. It overlaps with data mining and statistics and has wide applications in areas such as object recognition, computer vision, robot locomotion and bioinformatics.
Concept: A generalization or abstraction of a particular set of instances or a particular category of images at the data level.
Pattern Recognition: Part of machine learning (with supervised learning underpinnings) that classifies or extracts patterns from raw data (measurements or observations) relying on the features of the data.
Content-Based Image Retrieval (CBIR): This approach retrieves or searches digital images from large databases using the content of the images themselves or syntactical image features without human intervention. To aid image retrieval, techniques from statistics, pattern recognition, signal processing, and computer vision are commonly deployed. Other terms used interchangeably for CBIR are query by image content (QBIC) and content-based visual information retrieval (CBVIR).
Computer Vision or Object Recognition: A discipline concerned with the science and technology of artificial systems for extracting information from images and multidimensional data. Application areas include industrial robots, indexing databases of images and industrial inspection.