Visual Attention Guided Object Detection and Tracking

Visual Attention Guided Object Detection and Tracking

Debi Prosad Dogra (Indian Institute of Technology Bhubaneswar, India)
DOI: 10.4018/978-1-4666-8723-3.ch004
OnDemand PDF Download:
No Current Special Offers


Scene understanding and object recognition heavily depend on the success of visual attention guided salient region detection in images and videos. Therefore, summarizing computer vision techniques that take the help of visual attention models to accomplish video object recognition and tracking, can be helpful to the researchers of computer vision community. In this chapter, it is aimed to present a philosophical overview of the possible applications of visual attention models in the context of object recognition and tracking. At the beginning of this chapter, a brief introduction to various visual saliency models suitable for object recognition is presented, that is followed by discussions on possible applications of attention models on video object tracking. The chapter also provides a commentary on the existing techniques available on this domain and discusses some of their possible extensions. It is believed that, prospective readers will benefit since the chapter comprehensively guides a reader to understand the pros and cons of this particular topic.
Chapter Preview


Attention of a person toward a particular portion of an object in a stationary scene is guided by various features such as intensity, color, contrast, texture, size, and other salient characteristics of the scene (Sun, 2003). Regions that attract human attention are popularly known as salient regions of a given image or scene. Scientists have proposed several methods that are based on psychological as well as statistical parameters to localize such regions. These methods are quite popular amongst the researchers of this community for carrying out various basic as well as advanced level image processing tasks, namely image segmentation, object recognition, content based image retrieval, and pattern recognition applications.

Visual attention model finds application in video processing too. For example, detection and tracking of objects in videos is often aided by visual attention guided models. Object detection which is considered to be one of the preliminary steps of several computer vision tasks is often carried out with the help of localizing salient regions in a given scene. Since, deformation of a shape can be better understood in temporal domain, localization of salient regions representing a particular shape is extended for a sequence of frames. Therefore, visual attention becomes quite important in localizing these salient regions to be used further for tracking. Similarly, various parameters related to the movement of an object can be better understood using visual attention based models. It is a well-known fact that human visual attention is often influenced by movement patterns. Therefore, it is necessary to give sufficient importance to this parameter to accurately track moving objects in videos where multiple objects are interacting with each other. This can be understood from the below mentioned hypothesis. Assume, a group of people moving voluntarily in an environment without any knowledge about their movements being closely monitored by observers. In such a scenario, it is expected that an observer will give equal importance to all the moving persons unless something unusual activity is noticed by the observer. Here, unusual movements can be of following types: sudden quick movement, slow movement, unusual trajectory, and several other variations that are easily detectable by a human. This happens because humans are trained to recognize them as unusual activities happening within its field of view. Similar reasoning can be applied to support the influence of visual attention in situations when humans pay more attention toward a particular person or a group of persons due to their change in appearance, e.g. height, clothing, spatial locations, etc. Therefore, human attention model must not be ignored while designing robust object tracking algorithms if it is desired to act quite accurately.

However, the requirements of object detection and tracking and its applicability vary largely from task to task. If the object to track is known in advance, model-based trackers may be applied which require an initial training phase. In some applications however, the object of interest is not known in advance. A user might for example react to an object to the system for various reasons. A long training phase is usually inacceptable in such applications. Therefore, online learning methods are often called in such situations. In systems with a static camera, it is possible to apply methods like background subtraction. If interest is for example in counting people or other statistical investigations which do not require immediate response, it is possible to process the data offline which extends the range of applicable algorithms considerably. On the other hand, systems which shall operate on a mobile platform usually have to operate in real-time and have to deal with more difficult settings. The background changes, illumination conditions vary, and platforms are often equipped with low-resolution cameras. Such conditions require robust and flexible tracking mechanisms. Mostly, feature-based tracking approaches are applied in such areas which track an object based on simple features such as color cues or corners. However, in all of the above situations, visual attention guided techniques can play a crucial role.

Complete Chapter List

Search this Book: