Topological Semiotics of Visual Information

Topological Semiotics of Visual Information

DOI: 10.4018/978-1-5225-2431-1.ch004
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

This chapter explains how active vision and image understanding can be implemented with topological semiotic models, using cognitive architecture with perceptual mechanisms similar to human vision.
Chapter Preview
Top

Introduction

There were significant efforts in conversion image data into meaningful informational structures, and also on the usage of context in the processing of visual information. For instance, Geographic Information Systems (GIS) can adequately address problems with geographic and satellite imagery, because geographic knowledge has been well formalized in the form of maps, and maps can be represented well in digital form.

In the field of multimedia, the MPEG-7 standard was an extensive industry effort to address these problems for generic images, converting them into XML structures. MPEG-7 provides a set of image primitives called Descriptors. The MPEG-7 Description Scheme is the structure and semantics of the relationships between image components, which may be both Descriptors and Description Schemes. A MPEG-7 image description consists of a Description Scheme and a set of Descriptor Values.

MPEG-7 supports a range of abstraction levels, from low-level video features, such as are object’s shape, size, texture, color, movement, and position, to high-level semantic information. However, the MPEG-7 standard reflects the present state of image/video processing, and it only provides a set of predefined descriptors and schemas. MPEG-7 Visual Descriptors evolve from low-level image processing, which is well understood and formalized. However, Description Schemas relate to mid- and high-level image processing, which has not yet been well formalized.

Neither automatic and semi-automatic feature extraction nor schema creating algorithms is within the scope of the MPEG-7 standard. Although most low-level features can be extracted automatically, high-level features and schemas usually need human supervision and annotation. Only the description format in MPEG-7 is fixed and not the extraction and transformation methodologies. These are the areas that must be addressed.

The highest level of image description is the semantic one, and MPEG-7 standardizes information on these levels. But the problem of transforming primary image structures directly into semantic description has not been solved yet, as processes on the intermediary levels are not well understood and formalized.

Although RDF is better than other schemas in its ability to specify relationships and graphs, the MPEG-7 Group has made a decision to use an easily understandable and readable XML Schema Language as the MPEG-7 DDL. However, neither RDF nor XML Schema has been designed to describe complex dynamic hierarchical structures that constitute most of the real images.

MPEG-7 Visual Descriptors can be used for searching and filtering images and videos based on several visual features such as color, texture, object shape, object motion, and camera motion. This allows measuring the similarity between images and videos. Such a set of descriptors might be sufficient for the entire image.

Similar to MPEG-7 approaches convert images into their structured description that is based on low-level image features and their combinations, which use either top-down or bottom-up flow of processing image data or both types of flow, and attaching linguistic values for semantic querying. Most of them are trying to convert the image into a sort of structural description that can be compared against a similarly described collection of images stored in a database. (See Figure 1)

Figure 1.

MPEG-7 and multimedia applications

These approaches might work well for image and multimedia databases as they allow for creating structured collections of images, and querying them on certain similarity criteria, but not for the systems that must perform in the real-time and hostile environments. These approaches are not able to provide the needed level of understanding of the environment.

It is well known that expert systems (See Figure 2) in the late 80’s and early 90’s proved themselves to be ineffective in most areas of potential application. They were based on semantic principles, and processed data represented as language constructs. Semantic representation is good for knowledge acquisition from language, serving as a mediator between human experts and computers. But it does not work well for modeling of intellectual processes. That limited intellectual capabilities of the systems to what was typed in.

Complete Chapter List

Search this Book:
Reset