Multi-Scale People Detection and Motion Analysis for Video Surveillance

Multi-Scale People Detection and Motion Analysis for Video Surveillance

YingLi Tian (The City College of City University of New York, USA), Rogerio Feris (IBM T.J. Watson Research Center, USA), Lisa Brown (IBM T.J. Watson Research Center, USA), Daniel Vaquero (University of California, USA), Yun Zhai (IBM T.J. Watson Research Center, USA) and Arun Hampapur (IBM T.J. Watson Research Center, USA)
Copyright: © 2010 |Pages: 26
DOI: 10.4018/978-1-60566-900-7.ch006
OnDemand PDF Download:
No Current Special Offers


Visual processing of people, including detection, tracking, recognition, and behavior interpretation, is a key component of intelligent video surveillance systems. Computer vision algorithms with the capability of “looking at people” at multiple scales can be applied in different surveillance scenarios, such as farfield people detection for wide-area perimeter protection, mid-field people detection for retail/banking applications or parking lot monitoring, and near-field people/face detection for facility security and access. In this chapter, we address the people detection problem in different scales as well as human tracking and motion analysis for real video surveillance applications including people search, retail loss prevention, people counting, and display effectiveness.
Chapter Preview

1. Introduction

As the number of cameras deployed for surveillance increases, the challenge of effectively extracting useful information from the torrent of camera data becomes formidable. The inability of human vigilance to effectively monitor surveillance cameras is well recognized in the scientific community [Green 1999]. Additionally, the cost of employing security staff to monitor hundreds of cameras by manually watching videos is prohibitive.

Intelligent (smart) surveillance systems, which are now “watching the video” and providing alerts and content-based search capabilities, make the video monitoring and investigation process scalable and effective. The software algorithms that analyze the video and provide alerts are commonly referred to as video analytics. These are responsible for turning video cameras from a mere data gathering tool into smart surveillance systems for proactive security. Advances in computer vision, video analysis, pattern recognition, and multimedia indexing technologies have enabled smart surveillance systems over the past decade.

People detection, tracking, recognition, and behavior interpretation play very important roles in video surveillance. For different surveillance scenarios, different algorithms are employed to detect people in distinct scales, such as far-field people detection for wide-area perimeter protection, mid-field people detection for retail/banking applications or parking lot monitoring, and near-field people/face detection for facility security and access. People detection and tracking has been an active area of research. The approaches for people detection can be classified as either model-based or learning-based. The latter can use different kinds of features such as edge templates [Gavrila 2000], Haar features [Viola et al. 2001, 2003], histogram-of-oriented-gradients descriptors [Dalal & Triggs 2005, Han et al. 2006], shapelet features [Sabzmeydani 2007], etc. To deal with occlusions, some approaches use part-based detectors [Wu & Nevatia 2005, Leibe 2005].

In our system, learning-based methods are employed to detect humans at different scales. For each person entering and leaving the field of view of a surveillance camera, our goal is to detect the person and to store in a database a key frame containing the image of the person, associated with a corresponding video. This allows the user to perform queries such as “Show me all people who entered the facility yesterday from 1pm to 5pm.” The retrieved key frames can then be used for recognition, either manually or by an automatic face recognition system (if the face image is available). To achieve this goal, we developed a novel face detector algorithm that uses local feature adaptation prior to Adaboost learning. Local features have been widely used in learning-based object detection systems. As noted by Munder and Gavrila [Munder & Gavrila 2006], they offer advantages over global features such as Principal Component Analysis [Zhang et al. 2004] or Fisher Discriminant Analysis [Wang & Ji 2005], which tend to smooth out important details.

In order to detect trajectory anomalies, our system tracks faces and people, analyzes the paths of tracked people, learns a set of repeated patterns that occur frequently, and detects when a person moves in a way inconsistent with these normal patterns. We implement two types of tracking methods: person-detection-based and moving-object-based. The person-detection-based tracking method is used to track faces and people in near-field scenarios. In far-field scenarios, the moving-object-based tracking method is employed because faces are too small to be accurately detected. The moving objects are first detected by an adaptive background subtraction method, and are then tracked by using a tracking method based on appearance. An object classifier further labels each tracked object as a car, person, group of people, animal, etc. To build the model of motion patterns, the trajectories of all tracks with a given start/end location labeling are resampled and clustered together. This gives an average or “prototypical” track along with standard deviations. Most tracks from a given entry location to a given exit will lie close to the prototypical track, with typical normal variation indicated by the length of the crossbars. Tracks that wander outside this normal area can be labeled as anomalous and may warrant further investigation. The principal components of the cluster indicate typical modes of variation or “eigentracks”, providing a more accurate model of normal vs. abnormal.

Complete Chapter List

Search this Book: