Discovering Semantics from Visual Information

Discovering Semantics from Visual Information

Zhiyong Wang, Dagan Feng
Copyright: © 2012 |Pages: 29
DOI: 10.4018/978-1-60960-818-7.ch808
(Individual Chapters)
No Current Special Offers


Visual information has been immensely used in various domains such as web, education, health, and digital libraries, due to the advancements of computing technologies. Meanwhile, users realize that it has been more and more difficult to find desired visual content such as images. Though traditional content-based retrieval (CBR) systems allow users to access visual information through query-by-example with low level visual features (e.g. color, shape, and texture), the semantic gap is widely recognized as a hurdle for practical adoption of CBR systems. Wealthy visual information (e.g. user generated visual content) enables us to derive new knowledge at a large scale, which will significantly facilitate visual information management. Besides semantic concept detection, semantic relationship among concepts can also be explored in visual domain, other than traditional textual domain. Therefore, this chapter aims to provide an overview of the state-of-the-arts on discovering semantics in visual domain from two aspects, semantic concept detection and knowledge discovery from visual information at semantic level. For the first aspect, various aspects of visual information annotation are discussed, including content representation, machine learning based annotation methodologies, and widely used datasets. For the second aspect, a novel data driven based approach is introduced to discover semantic relevance among concepts in visual domain. Future research topics are also outlined.
Chapter Preview

1. Introduction

In the last decades we have witnessed tremendous growth in visual information such as images and videos, due to the advancements of computing technologies. It has never been easier than today to take images through digital cameras or video shots through camcorders, ranging from personal collections to professional archives such as news. And the rapid development of the Web has further accelerated this process by allowing any users to publish or share their visual contents conveniently. There are about 3 billion images are hosted by Flickr1 and about 12.7 billion videos were watched in a month by America Internet users only2. And millions of images are uploaded every day to popular photo sharing web sites like Flickr. Therefore, efficient access to such enormous amount of visual information has emerged as a challenging issue.

Since 1980s, visual information retrieval has been a very active research topic with the endeavor from two communities, database management and computer vision (Tamura & Yokoya, 1984) (Chang, Shi, & Yan, 1987) (Chang, Yan, Dimitroff, & Arndt, 1988). These two communities study image retrieval from different aspects, one being text or alphanumeric based and the other visual based, respectively. Unlike textual information which can be characterized in terms of its semantic primitives (i.e. terms), visual information lacks such primitives even with the state-of-the-art techniques in computer vision. Therefore, in order to leverage the success of relational database, visual information in general is manually annotated with textual descriptions for retrieval purpose. However, manual annotation suffers from the follow issues:

  • Manual annotation is very time consuming and labor intensive, which is not scalable to the dramatically growing visual information.

  • Manual annotation is subjectively dependent on annotators and not general for all the possible front-end users, especially for the users using different languages.

  • It is very difficult and challenging to describe image content with only several keywords. As said a picture is worth a thousand words. Sometimes some aspects of image content such as texture are even beyond words.

Complete Chapter List

Search this Book: