Human Motion Tracking in Video: A Practical Approach

Human Motion Tracking in Video: A Practical Approach

Tony Tung (Kyoto University, Japan) and Takashi Matsuyama (Kyoto University, Japan)
Copyright: © 2010 |Pages: 13
DOI: 10.4018/978-1-60566-900-7.ch001
OnDemand PDF Download:
No Current Special Offers


This chapter presents a new formulation for the problem of human motion tracking in video. Tracking is still a challenging problem when strong appearance changes occur as in videos of humans in motion. Most trackers rely on a predefined template or on a training dataset to achieve detection and tracking. Therefore they are not efficient to track objects whose appearance is not known in advance. A solution is to use an online method that updates iteratively a subspace of reference target models. In addition, we propose to integrate color and motion cues in a particle filter framework to track human body parts. The algorithm process consists of two modes, switching between detection and tracking. The detection steps involve trained classifiers to update estimated positions of the tracking windows, whereas tracking steps rely on an adaptive color-based particle filter coupled with optical flow estimations. The Earth Mover distance is used to compare color models in a global fashion, and constraints on flow features avoid drifting effects. The proposed method has revealed its efficiency to track body parts in motion and can cope with full appearance changes. Experiments were performed on challenging real world videos with poorly textured models and non-linear motions.
Chapter Preview

1. Introduction

Human motion tracking is a common requirement for many real world applications, such as video surveillance, games, cultural and medical applications (e.g. for motion and behavior study). The literature has provided successful algorithms to detect and track objects of a predefined class in image streams or videos. Simple object can be detected and tracked using various image features such as color regions, edges, contours, or texture. On the other hand, complex objects such as human faces require more sophisticated features to handle the multiple possible instances of the object class. For this purpose, statistical methods are a good alternative. First, a statistical model (or classifier) learns different patterns related to the object of interest (e.g. different views of human faces), including good and bad samples. And then the system is able to estimate whether a region contains an object of interest or not. This kind of approach has become very popular. For example, the face detector of (Viola, & Jones, 2001) is well known for its efficiency. The main drawback is the dependence to prior knowledge on the object class. As the system is trained on a finite dataset, the detection is somehow constrained to it. As a matter of fact, most of the tracking methods were not designed to keep the track of an object whose appearance could strongly change. If there is no a priori knowledge on its multiple possible appearances, then the detection fails and the track is lost. Hence, tracking a head which turns completely, or tracking a hand in action remain challenging problems, as appearance changes occur quite frequently for human body parts in motion .

We introduce a new formulation dedicated to the problem of appearance changes for object tracking in video. Our approach integrates color cues and motion cues to establish a robust tracking. As well, an online iterative process updates a subspace of reference templates so that the tracking system remains robust to occlusions. The method workflow contains two modes, switching between detection and tracking. The detection steps involve trained classifiers to update estimated positions of the tracking windows. In particular, we use the cascade of boosted classifiers of Haar-like features by (Viola, & Jones, 2001) to perform head detection. Other body parts can be either detected using this technique with ad-hoc training samples, or chosen by users at the initialization step, or as well can be deduced based on prior knowledge on human shape features and constraints. The tracking steps rely on an adaptative color-based particle filter (Isard, & Blake, 1998) coupled with optical flow estimations (Lucas, & Kanade, 1981; Tomasi, & Kanade, 1991). The Earth Mover distance (Rubner, Tomasi, & Guibas, 1998) has been chosen to compare color models due to its robustness to small color variations. Drift effects inherent to adaptative tracking methods are handled using optical flow estimations (motion features).

Our experiments show the accuracy and robustness of the proposed method on challenging video sequences of human in motion. For example, videos of yoga performances (stretching exercises at various speed) with poorly textured models and non-linear motions were used for testing (cf. Figure 1).

Figure 1.

Body part tracking with color-based particle filter driven by optical flow. The proposed approach is robust to strong occlusion and full appearance change. Detected regions are denoted by dark gray squares, and tracked regions by light gray squares.


The rest of the chapter is organized as follows. The next section gives a recap of work related to the techniques presented in this chapter. Section 3 presents an overview of the algorithm (initialization step and workflow). Section 4 describes the tracking process based on our color-based particle filter driven by optical flow. Section 5 presents experimental results. Section 6 concludes with a discussion on our contributions.

Complete Chapter List

Search this Book: