Recognition of Humans and Their Activities for Video Surveillance

Recognition of Humans and Their Activities for Video Surveillance

Alok Kumar Singh Kushwaha, Rajeev Srivastava
DOI: 10.4018/978-1-4666-4868-5.ch009
(Individual Chapters)
No Current Special Offers


Human Activity Recognition is an active area of research in computer vision with wide-scale applications in video surveillance, motion analysis, virtual reality interfaces, robot navigation and recognition, video indexing, browsing, HCI, choreography, sports video analysis, etc. The analysis of vision-based human activities in videos is an area with increasingly important consequences from security and surveillance to public place and personal archiving. Several challenges at various levels of processing-robustness against errors in low-level processing, view and rate-invariant representations at mid-level processing, and semantic representation of human activities at higher-level processing make this problem hard to solve. The task is challenging due to variations in motion performance, recording settings, and inter-personal differences. In this chapter, the authors explicitly address these challenges. They present a survey of existing work and describe some of the more well-known methods in these areas. They also describe their own research and outline future possibilities. Detailed overviews of current advances in the field are provided. Image representations and the subsequent classification processes are discussed separately to focus on the novelties of recent research. Moreover, the authors discuss the limitations of the state of the art and outline promising directions of research.
Chapter Preview

Motivation For Human Activity Recognition

The 9/11 event in USA and 26/11 event in Mumbai, India have demonstrated that there is a strong need for analysis and understanding of human activity in public areas in order to prevent terrorist activities. Now the aim is to analyze the video sequences, which include detection and tracking of moving human objects, and to analyze their activity and behavior. This analysis becomes the basis of applications in many areas such as security and surveillance, clinical applications, biomechanical applications, human robot interaction, entertainment and education etc. Security is depending on the CCTV cameras but the problem with CCTV camera based visual surveillance is that the human intervention is required for operation and decisions have to be taken by human operators. Today, video surveillance networks have a greater number of CCTV cameras. For large infrastructures, such as a mass transit system, over a thousand surveillance CCTV cameras may be deployed. These installations represent a huge amount of video to transmit, view and archive, making it impossible for a human monitor to analyze all of these video recordings in order to detect suspicious activity or events. This is especially true since security control centre personnel are also required to manage other tasks, such as access control, issuance of badges/keys/permits, handling emergency calls, following up on fire alarms, radio communications control, etc. Several studies show the limits of human surveillance. After only 20 minutes of looking at and analyzing video surveillance screens, the attention of most people falls below an acceptable level(Hampapur et al. 2003). A monitor cannot attentively follow 9 to 12 cameras for over 15 minutes (Hampapur et al. 2003). Certain studies report that the ratio between the number of screens and the number of cameras can be between 1:4 and 1:78 in certain video surveillance networks (Dee et al. 2008). The probability of reacting immediately to an event captured by a surveillance camera network is estimated at 1 out of 1,000 (Hampapur et al. 2003). That is why, historically, CCTV based video surveillance is mainly a post-event investigation tool. It is difficult and man power intensive to monitor the data collected from various cameras continuously and this gives rise to the necessity for automatic understanding of human actions and building a higher level knowledge of the events occurring in the scene by the computer vision system. Analysis in surveillance scenarios often requires the detection of abnormal human actions. Most of the normal human activities are periodic like walking, running etc. Lack of periodicity is therefore an important cue of an activity being deviant from the normal. Consider for example a typical event of surveillance interest: exchange of brief cases by two agents. The scene essentially consists of an agent walking across the scene who then bends to lift up or leave the briefcase. This event can be described as concatenation of walk-bend, walk actions, where bend is deviant from normal behavior. However abnormal events and therefore abnormal human activities are context dependent and may vary for different situations. For example, in a shopping mall where people normally walk from one counter to another, running could be defined as an abnormal action and could be an event of interest for surveillance purposes. This calls for a need of unified frame work for detecting and recognizing both periodic and non periodic human actions.

Complete Chapter List

Search this Book: