Video surveillance automation is used in two key modes: watching for known threats in real-time and searching for events of interest after the fact. Typically, real-time alerting is a localized function, for example, an airport security center receives and reacts to a “perimeter breach alert,” while investigations often tend to encompass a large number of geographically distributed cameras like the London bombing, or Washington sniper incidents. Enabling effective event detection, query and retrieval of surveillance video for preemption, and investigation, involves indexing the video along multiple dimensions. This chapter presents a framework for event detection and surveillance search that includes: video parsing, indexing, query and retrieval mechanisms. It explores video parsing techniques that automatically extract index data from video indexing, which stores data in relational tables; retrieval which uses SQL queries to retrieve events of interest and the software architecture that integrates these technologies.
Video surveillance systems which run 24/7 (24 hours a day and seven days a week) create a large amount of data including videos, extracted features, alerts, statistics etc. Designing systems to manage this extensive data and make it easily accessible for query and search is a very challenging and potentially rewarding problem. However, the vast majority of research in video indexing has taken place in the field of multimedia, in particular for authored or produced video such as news or movies, and spontaneous but broadcast video such as sporting events. Efforts to apply video indexing to completely spontaneous video such as surveillance data are still emerging.
The work in video indexing of broadcast video has focused on such tasks as shot boundary detection, story segmentation and high level semantic concept extraction. The latter is based on the classification of video, audio, and text into a small (10-20) but increasing number of semantically interesting categories such as outdoor, people, building, road, vegetation, and vehicle. For broadcast video, the goal is to find a high level indexing scheme to facilitate retrieval. The task objectives are very different for surveillance video. For surveillance video, the primary interest is to learn higher level behavior patterns. In both broadcast and surveillance video, there exists a semantic gap between the feasible low level feature set and the high level semantics or ontology desired by the system users.