Selective visual attention is an amazing capability of primate visual system to restrict the focus to few interesting objects (or portions) in a scene. Thus, primates are able to pay attention to the required visual content amidst myriads of other visual information. It enables them to interact with the external environment in real time through reduction of computational load in their brain. This inspires image and computer vision scientists to derive computational models of visual attention and to use them in varieties of applications in real-life, mainly to speed up the processing through reduction of computational burden which often characterizes image processing and vision tasks. This chapter discusses a wide variety of such applications of visual attention models in image processing, computer vision and graphics.
TopIntroduction
Primates have an amazing capability of dealing with dynamic surrounding in real-time. They receive myriads of sensory information at a constant basis. Their course of action is based on what they sense. The load of real-time processing of this sensory information is enormous. But they effectively handle such a huge processing requirement. There is no doubt that primate brain is the most efficient, in terms of sheer processing power, of all creations existing as of now. But still brain cannot handle all sensory information coming at a particular instant of time. It only selects a few of this sensory information and processes them deep inside the brain, where activities like recognition and decision making take place. Most of the other information is discarded before it reaches to the deeper brain. This psycho-neurological phenomenon is known as selective attention.
Like primates, computer vision tasks also face the difficulty of handling this huge amount of sensory input (Tsotsos, 1990). To tackle this problem, computer vision researchers draw inspiration from the selective attention component of primate brain to restrict the computation in certain areas of input. As a result, computational modeling of visual attention has grown as an active research problem since last two decades. It requires a collective approach of theories from psychology, neurobiology of human visual system and other related topics. Psycho-visual experiments have provided some theoretical reasoning for saliency of a location or an object. Computer vision researchers try to fit various types of mathematical, statistical, or graph-based models on acquired eye-tracking data on the basis of these psycho-visual experiments.
There are two types of attention mechanism, i.e., bottom-up and top-down. Bottom-up attention is purely driven by external stimuli. It involuntarily attracts our gaze to salient portions in a scene (Itti and Koch, 2001). It models attractiveness of scene components at early stage of vision in the absence of semantic or context dependent knowledge about the scene being viewed. It is primarily driven by the unusualness in stimulus (in terms of one or more features) with respect to surroundings of a location or an object. In other words, this bottom-up mechanism of attention guides our vision towards distinguishable items in a scene. On the other hand, top-down mechanism of attention is driven by the demand of the task to be performed (Pelz and Canosa, 2001; Yarbus, 1967). This type of attention is controlled by semantic, context-dependent, or task-specific knowledge.
In the context of computer vision tasks, selective attention to a few pertinent salient portions in a scene has various advantages. It reduces the computational burden by decreasing the amount of data to be processed. Tasks such as searching a target object in a scene draws immense benefit from this attention-driven reduction of processing load. Moreover, suppression of irrelevant information ensures influence of only the relevant locations of the scene in the outcome of the system. As an example, tracking of an object in a scene or navigating a pilotless vehicle with the help of artificial vision system are examples of this category. In certain applications, indiscriminative treatment is given based on the saliency of individual contents of the scene. Visual attention guided compression is one such example. Here higher compression ratio is applied for less salient components of the image. On the contrary, salient image components are not compressed much. Underlying assumption behind this kind of compression techniques is that distortion due to lossy compression will not be perceptible, if they are restricted to less salient portions in the image.
Thus, a wide range of activities draw benefit from visual attention. This chapter, at first, briefly mentions computational models of two different categories of attention, i.e., bottom-up and top-down. Then, it focuses on wide range of applications of visual attention in image processing, computer vision, and graphics. These are discussed under various categories, such as
- •
Image and video processing (intelligent capturing of photos, compression, retargeting, watermarking, image quality assessment, color modification to guide attention and many more)
- •
Computer vision (object detection and tracking, scene classification, scene understanding, etc.)
- •
Robotics (self-localization and mapping, environment understanding and navigation for humanoid robots, pilotless vehicle navigation through artificial vision system, etc)
- •
Graphics (rendering and exploring virtual environments, improving 3D TV viewing experience, etc.)