Biologically-Inspired Models for Attentive Robot Vision: A Review

Biologically-Inspired Models for Attentive Robot Vision: A Review

Amirhossein Jamalian (Technical University of Chemnitz, Germany) and Fred H. Hamker (Technical University of Chemnitz, Germany)
DOI: 10.4018/978-1-4666-8723-3.ch003
OnDemand PDF Download:


A rich stream of visual data enters the cameras of a typical artificial vision system (e.g., a robot) and considering the fact that processing this volume of data in real-rime is almost impossible, a clever mechanism is required to reduce the amount of trivial visual data. Visual Attention might be the solution. The idea is to control the information flow and thus to improve vision by focusing the resources merely on some special aspects instead of the whole visual scene. However, does attention only speed-up processing or can the understanding of human visual attention provide additional guidance for robot vision research? In this chapter, first, some basic concepts of the primate visual system and visual attention are introduced. Afterward, a new taxonomy of biologically-inspired models of attention, particularly those that are used in robotics applications (e.g., in object detection and recognition) is given and finally, future research trends in modelling of visual attention and its applications are highlighted.
Chapter Preview


Machine vision is used in many real world applications such as surveillance systems, robotics, sport analysis and other new technologies. Obviously, processing of all visual data may not be necessary and using current technology a deep image analysis is quite impossible in real-time. Hence, a clever mechanism is required to select useful, desirable and relevant data while omitting others. When we inspire machine vision from biology, one of the mechanisms in the primate’s brain which determines which part of the sensory data is currently the most relevant part is selective attention or briefly attention. Focusing the resources only on some parts of the whole visual scene, information flow is controlled to improve vision. First, vision modules compute features of the scene (such as color, intensity, etc.) rather independently, but then they will have to be integrated for further processing. Attention has been first described as a spotlight of attention (Posner et al., 1980). A spotlight prefers a particular region of interest for further processing. In human vision, the highest resolution belongs to the center of the retina (fovea). Hence, when a human looks at a certain object, it is like using a spotlight to highlight this object within a dark room (Shulman et al., 1979). It is possible to get an impression of the whole visual scene by scanning it using saccades (quick eye movements) exactly like one can realize the content of a dark room using a shifting spotlight.

Many computational models of visual attention have been proposed in literature. Koch & Ullman (1985) proposed a winner-takes-all (WTA) strategy on a saliency map to determine a location of interest, a concept on which more sophisticated models have been developed (Itti et al., 1998; Frintrop, 2005). In general, this model suggests transforming the complex problem of scene understanding into a sequential analysis of image parts. Although such spatial selection is computationally efficient (Tsotsos, 1990; Ballard, 1991), other crucial issues of efficiency should be considered as well. For instance, it may be not ideal to scan too many salient items before the relevant one. Besides, by focusing the processing only on salient items, other important parts might be missed. In this case, the methodology should incorporate high-level signals (e.g., knowledge about a desired object) into the selection process. Moreover, attention mechanisms should not only select points in space, but also should facilitate further processing of the scene. For example, if an attention mechanism only determines a few salient points in space, it would not be useful for high level tasks such as object recognition. Thus, additionally, it must enhance the relevant features for object recognition to indicate the salient regions (Hamker, 2005a).

The majority of attention models in machine vision applications only specify the regions of interest (RoI), either by bottom-up or by merging bottom-up and top-down factors. However, attention can be used for the binding problem as well (Tsotsos, 2008). Binding refers to the problem of relating features to each other that are processed independently and are represented in different maps (e.g., shape and location). Tsotsos (2008) described a set of four binding processes (convergence binding, full recurrence binding, partial recurrence binding and iterative recurrence binding) which use attentive mechanisms to achieve recognition and claimed that these four are enough for solving recognition, discrimination, localization, and identification tasks solving the binding problem. In each process, attention is involved somehow, e.g., in the first process, attention is used to search for a maximum response within a neural representation arranged in hierarchical layers. Besides, in a localization task (full recurrence binding), top-down stimulus segmentation and localization as well as local maximum selection on the top-down traversal have to be performed.

Complete Chapter List

Search this Book: