Keyframe-Based Vehicle Surveillance Video Retrieval

Keyframe-Based Vehicle Surveillance Video Retrieval

Xiaoxi Liu (Shandong University, China), Ju Liu (Shandong University, China), Lingchen Gu (Shandong University, China) and Yannan Ren (Shandong University, China)
Copyright: © 2019 |Pages: 10
DOI: 10.4018/978-1-5225-7912-0.ch026

Abstract

This article describes how due to the diversification of electronic equipment in public security forensics, vehicle surveillance video as a burgeoning way attracts us attention. The vehicle surveillance videos contain useful evidence, and video retrieval can help us find evidence contained in them. In order to get the evidence videos accurately and effectively, a convolution neural network (CNN) is widely applied to improve performance in surveillance video retrieval. In this article, it is proposed that a vehicle surveillance video retrieval method with deep feature derived from CNN and with iterative quantization (ITQ) encoding, when given any frame of a video, it can generate a short video which can be applied to public security forensics. Experiments show that the retrieved video can describe the video content before and after entering the keyframe directly and efficiently, and the final short video for an accident scene in the surveillance video can be regarded as forensic evidence.
Chapter Preview
Top

1. Introduction

Recently, with the enormous variety of electronic products, a large scale of surveillance videos flooded into our daily life. The fact is that many industries have used the monitor camera to record video information, which brings the convenience of obtaining the evidence for public security. The vehicle surveillance videos recorded in vehicle recorder as new evidence have several advantages. Firstly, the vehicle surveillance video records the vehicle activity process. Compared to the traditional surveillance road camera, it is more flexibility. Secondly, the view angle of vehicle surveillance video is superior to that of the traditional road monitoring, and it can get much clearer recorded scene videos. Thirdly, in some areas without the road monitoring equipment, the vehicle surveillance can make up for the blind area. However, how to get more effective surveillance evidence becomes one of the most urgent problems to be solved. In this paper, we propose a vehicle surveillance video retrieval method which brings it into the effective evidence resources. Just input any frame of an accident scene, the short video which includes incidents or crime process can be retrieved quickly.

Keyframe-based video retrieval is commonly projected as retrieving relevant keyframes of videos. Typically, video clips are decomposed to keyframes, which are frames down sampled from videos. These keyframes record the contents and time information of the accident. Therefore, only retrieving and matching the query frame and results keyframes in the vehicle surveillance video, sorted them by the time information, we can get the evidence short video.

The current video retrieval algorithms are evolved from content based image retrieval (CBIR), extracting keyframe features with traditional Scale-Invariant Feature Transform (SIFT), Harris (Harris, & Stephens, 1988), GIST (Oliva, & Torralba, 2001), etc. Most of the algorithms try clustering form Bag of Words (BoW) from keyframes, or using various classifiers to retrieve similar keyframes which make up a short video. What is more, hash, as one the most effective indexing tools, can further enhance the retrieval performance. Hash algorithms mainly include Locality-Sensitive Hashing (LSH) (Charikar, 2002), SortingKeys-LSH (SK-LSH), ITQ (Gong, Gordo, & Perronnin, 2013, Gong, & Lazebnik, 2011), etc. Although these local features have different improvements in encoding the features, the time spent on building local features is extremely long. Besides, the high dimensions will lead to curse of dimensionality and the retrieval efficiency will be low.

In recent years, CNN has the outstanding performance in image classification, winning the concern of researchers in image and video retrieval areas. It is found that the CNN shows excellent performance as a description. In addition, the deeper and wider CNN designs, the better retrieval results can be obtained. However, the deeper and wider of the CNN is, the more conditions are required in hardware environment. Therefore, based on the pre-trained CNN model, training special and suitable database to better fit the video information will achieve better description ability in a restricted hardware environment (Wang, Ming, Liu & Yin, 2017, Guo, Wang, & Lu, 2016, Guo, Wang, & Lu, 2015). In this paper, we propose to use the VGG-F model (Chatfield, Simonyan, Vedaldi, & Zisserman, 2014) pre-trained in ImageNet ILSVRC12, and fine-tune the parameters of the model to fit the special database. At the same time, combined with the 128-bit ITQ hash codes, we can further improve the retrieval performance. In the experiments, the required short vehicle surveillance evidence is accurately obtained. It means that the vehicle surveillance video can effectively become the crime and accident evidence, which shows the performance of the proposed retrieval method.

Complete Chapter List

Search this Book:
Reset