Human motion analysis (Moeslund et. al., 2006; Wang et. al., 2003) is currently one of the most active research areas in computer vision due both to the number of potential applications and its inherent complexity. This high interest is driven by many applications in many areas such as surveillance, virtual reality, perceptual, control applications or analysis of human behaviors. However, the research area contains a number of difficult, such as ill-posed problem. So, many researchers have investigated these problems. Human motion analysis is generally composed of three major parts: human detection, tracking and the behavior understandings.
8.1 Scale Adaptive Filters
Human detection is an essential task for many applications such as human robot interaction, video surveillance, human motion tracking, gesture recognition, human behavior analysis, etc. Among many applications, we are interested in the human detection in the field of human robot interaction (HRI). Due to the need for intelligent robots to co-exist with humans in a human-friendly environment, it is essential to be aware of human around.
Often, a single static camera is used for human detection due to its inexpensive cost and the easy handling (Hussein et. al., 2006; Ghidary et. al., 2000; Zhou and Hoang, 2005). However, in the case of mobile robots, human detection is difficult because the robot (camera) and the human are moving around each other, the illumination conditions are changing, and the backgrounds are changing over time. In this situation, it is effective to use the depth cues from a stereo camera to detect the humans (Xu and Fujimura, 2003; Beymer and Konolige,1999; Salinas et. al., 2005; Li et. al., 2004a, 2004b). In this work, we also use stereo-based vision to detect the humans..
There are a variety of human detection methods. Beymer et. al. (1999) used the background subtraction in disparity map obtained from the stereo camera and the template matching in disparity to detect the humans. Their method was not appropriate for the mobile stereo camera. Salinas et. al. (2005) created a map of static background information and segmented the moving objects in the map. They used face detection for human verification. However, they assumed that the humans were posed frontally. Li et. al. (2004a, 2004b) designed the object-oriented scale-adaptive filter (OOSAF) and segmented the human candidates by applying the OOSAF whose filter parameter was changed in accordance with the distance between the camera and the human. They verified the human candidates using the template matching of the human head-shoulder. Their approach showed a good human detection rate and was suitable for a mobile robot platform because it did not use the background subtraction. However, it showed a poor human detection rate when the humans were not faced frontal.
We propose a pose robust human detection method from a sequence of stereo images in the cluttered environment in which the camera and the human are moving around and the illumination conditions are changed. It consists of two modules: human candidate detection, human verification. To detect humans, we apply the MO2DEFs of four specific orientations to the 2D spatial-depth histogram, and segment the human candidates by taking the thresholds over the filtered histograms. After that, we aggregate the segmented human candidates to improve the evidence of human existence and determine the human pose by taking the orientation of the 2D elliptical filter whose convolution is maximal among the MO2DEFs. Human verification has been conducted by either detecting the face or matching head-shoulder shapes over the segmented human candidates of the selected rotation.