Methods and Perspectives in Face Tracking Based on Human Perception

Methods and Perspectives in Face Tracking Based on Human Perception

Vittoria Bruni (Sapienza University of Rome, Italy & National Research Council, Italy) and Domenico Vitulano (National Research Council, Italy)
DOI: 10.4018/978-1-4666-8789-9.ch024
OnDemand PDF Download:
List Price: $37.50


This chapter aims at analyzing the role of human early vision in image and video processing, with particular reference to face perception, recognition, and tracking. To this aim, the change of perspective in approaching image processing-based problems where the decoder (human eye) plays a central role is analysed and discussed. In particular, the main topics of this contribution are some important neurological results that have been successfully used in face detection and recognition, as well as those that seem to be promising in giving new and powerful tools for face tracking, which remains a less investigated topic from this new standpoint.
Chapter Preview


The objective of target tracking is to estimate the trajectory of an object as it moves around a scene from a sequence of images acquired by a video-camera. The efficient tracking of features in complex environments is a challenging task, especially for real time applications, such as video surveillance, traffic monitoring, motion based recognition, monitoring systems, robotics, and also for computer vision. In fact, the increasing and recent diffusion of video cameras and high powered computers allowed the use of computer vision techniques in this field and the development of new ones.

The whole tracking process consists of

  • The detection of the target in the first frame of the analysed video,

  • Its representation and localization in subsequent frames, and

  • The interpretation of its trajectory for high level processing such as recognition, warnings etc.

We are interested in the first two steps, with particular regard to the second one. Key points of this latter step are target representation through efficient features that are able to give a faithful and distinctive description of target appearance, and a proper similarity measure for finding the most probable target location in the analysed frame. The Chapter focuses on a specific target category, namely human faces, and gives an overview of some novel and effective features for the description of face appearance as well as some distance measures useful for face tracking. The main theme of the study is the use of perception rules for addressing the specific problem, respectively face representation and face tracking. This interest has been mainly motivated by the great benefits that the research of the last few decades received thanks to the use of the laws of visual perception in the solution of classical problems based on image/video processing. Human perception, indeed, has been at least threefold advantageous:

  • It allowed offering new solutions to some important problems, also improving the achievable results in some fields. Significant examples are novel image quality assessment metrics that gave objective distance measures that correlate with Human Visual System better than classical and widely used measures depending on the Mean Square Error (MSE), as it is shown in Figure 1.

  • It gave skills to make algorithms completely automatic, without requiring user’s intervention. This enabled the diffusion of a lot of algorithms and applications as well as the construction of tools usable by non-expert people in Computer Science, allowing the use of computer aided solutions and frameworks in different fields like medicine, biology, cultural heritage, telecommunications, etc.. As a further consequence, it allowed to allocate more time for more complex operations, as interpretation, allowing the definition of real time operations;

  • It allowed a considerable reduction of the computing time of several algorithms thanks to the possibility of using less information for getting the same goals. One example for all is fixation points that seem to effectively code all image information using very few points: very few features are then required for describing the most important (visual) image information.

Figure 1.

Original Einstein image (leftmost) corrupted by three different kinds of distortions (respectively change of luminance mean, jpeg compression and blurring --- images kindly provided by Prof. Zhou Wang of Waterloo University - Canada). The MSE value is quite the same for the three corrupted images whereas they are visually different. On the contrary, the values of MSSIM (mean structural similarity index (Wang, 2004)) are closer to the visual quality of the same images: the second image is more similar to the original one than the last two, whose visual quality is definitely worse.

Complete Chapter List

Search this Book: