Collective Event Detection by a Distributed Low-Cost Smart Camera Network

Collective Event Detection by a Distributed Low-Cost Smart Camera Network

Jhih-Yuan Hwang (National Sun Yat-sen University, Taiwan) and Wei-Po Lee (National Sun Yat-sen University, Taiwan)
DOI: 10.4018/978-1-4666-8654-0.ch004
OnDemand PDF Download:
List Price: $37.50


The current surveillance systems must identify the continuous human behaviors to detect various events from video streams. To enhance the performance of event recognition, in this chapter, we propose a distributed low-cost smart cameras system, together with a machine learning technique to detect abnormal events through analyzing the sequential behaviors of a group of people. Our system mainly includes a simple but efficient strategy to organize the behavior sequence, a new indirect encoding scheme to represent a group of people with relatively few features, and a multi-camera collaboration strategy to perform collective decision making for event recognition. Experiments have been conducted and the results confirm the reliability and stability of the proposed system in event recognition.
Chapter Preview


Using cameras to guard the security of our society has become a more popular method of public area surveillance. With the aid of video streams recorded by the surveillance equipment, security staffs can detect sudden unusual events and respond promptly to the emergent situations rapidly to reduce the risks. To reach an even higher safety level, more and more surveillance devices are now required to increase sensing coverage and to capture images from different visual angels. Yet, monitoring video steams manually is indeed a tedious and costly work. To reduce the load of security staffs and cost, many advanced vision-based techniques for automatic video content analysis have been developed. Based on the results, the surveillance system can only inform security staffs whenever necessary.

To detect various events from video streams, it is a trend for today’s surveillance systems to develop from analysis of individual images to that of continuous human behaviors (Krishnan & Cook, 2014; Popoola & Wang, 2012; Kamal, Ding, Morye, Farrell, & Roy-Chowdhury, 2014). Consequently, more powerful computational equipment is in demand in order to process the largely increasing amount of data. Also, in many cases a single view is not sufficient enough to cover a targeted region, and a network of cameras is thus required to cope with an open area in which many people move arbitrarily. At present, most of the camera networks follow a centralized architecture, which often suffers the problems of high communication cost and scalability (Munishwar & Abu-Ghazaleh, 2010; Rinner & Wolf, 2008). Therefore, an efficient surveillance system not only has to perform behavior recognition, but also to overcome the problem of bandwidth limitation. A promising solution is to adopt a distributed smart camera sensor network (Rinner & Wolf, 2008, Song, et al., 2010), in which a smart camera is an embedded system with reasonable computing ability and storage. The camera nodes can process locally available images, perform data compression, and transmit the results to the neighborhood nodes in the same network for information sharing. The nodes communicate in a peer-to-peer-manner, while only abstract information is exchanged between nodes. In this way, the overall computation can be achieved in a distributed way by a set of inexpensive devices.

The main goal of developing a distributed smart camera system in public area is to detect abnormal human behaviors or unusual events. Abnormal events mean observable events that occur unexpectedly, abruptly and unintentionally, and they invoke an emergency situation that requires fast responses (Roshtkhari & Levine, 2013). To achieve the above goal, some important issues need to be addressed. One is to extract pedestrian features from the video streams recorded by the cameras. With a set of properly defined features to represent target data, a computational method (specifically a machine learning method) can be employed to construct robust and reliable classifiers for event recognition. The other issue is to build behavior sequences with the selected feature. Though there have been many works focusing on how to precisely construct behavior sequences for the pedestrians from different image frames, most of such approaches are expensive in computation. To deploy the distributed camera network approach to a real life environment, more simple and efficient strategies are needed. Since the goal here is to train a classifier to detect an abnormal event occurring to a group of people acting in the public space, the data representation for these persons must be as concise as possible to ensure the efficiency and effectiveness of the learning method. The traditional encoding scheme of combining all personal features from the group is computationally expensive and the streaming nature of video data will result in high dimensional data that involve more resource requests. A new encoding scheme is thus needed. In addition, individual cameras are often not able to capture complete behavior sequences perfectly, due to some environmental factors in the real world, such as the blind angles of the camera network, the light reflection and the obstruction between objects. An efficient strategy with a relatively low resource need (in terms of computing and communicating) is required to exploit the device collaboration within a smart camera network.

Key Terms in this Chapter

Smart Camera: A self-contained vision system with built-in image sensor, capable of capturing images, extracting application-specific information from the images, generating event descriptions, and making decisions.

Region of Interest (RoI): A selected subset of samples within a dataset identified for a particular purpose. In computer vision, the ROI defines the borders of an object under consideration to perform some operation on it.

Histogram of Oriented Gradient (HOG): HOG counts occurrences of gradient orientation in localized portions of an image. HOG can be used to describe feature descriptors in computer vision and image processing for the purpose of object detection.

Video Surveillance Systems: A video system for monitoring the behavior, activities, or other changing information of the targets. It is for the purpose of influencing, managing, directing, or protecting the targets.

Support Vector Machine (SVM): A supervised machine learning method that constructs classification models with associated learning algorithms that analyze data and recognize patterns.

Feature Extraction: When the data is too large to be processed, the data will be transformed into a reduced representation set of features. The process of transforming the input data into the set of features is called feature extraction.

Collective Intelligence: A form of distributed intelligence that emerges from the collaboration, collective efforts, and competition of many individuals and appears in consensus decision making.

Complete Chapter List

Search this Book: