Article Preview
Top1. Introduction
The high rate of crime and violence among people is considered the third leading cause of death in 53 countries according to the report of the World Health Organization (WHO) European Region (Sethi et al., 2010). These alarming rates force governments to try to find solutions for such dangerous problems. Video surveillance systems are used for analyzing the objects behavior (Amira & Zagrouba, 2018). It involves object classification to understand the events (normal or abnormal) in videos. Abnormal activity detection plays a crucial role in surveillance applications (Huang et al., 2017; Wang et al., 2018; Cosar et al., 2017; Lloyd et al., 2017; Tripathi et al., 2019). The large-scale presence of surveillance systems is a real source of inspiration for the development of an automated system to detect problems of anti-social behavior such as vandalism, fights, gun killings, etc. In most current surveillance systems, monitoring depends on the existence of a human element. This makes monitoring a very challenging task. In addition, it is labor-intensive and prone to errors. These traditional systems have many problems such as weak security, low intelligence, high cost, and poor stability. Most of these systems are based on human operators. It is difficult for these operators to watch and analyze all the dangerous situations, especially with the long observation periods and a large number of cameras (Research, 2003). The reports included in (Research, 2003; Cohen et al., 2009; Dadashi, 2008) confirm that the Closed-Circuit Television (CCTV) operator suffers from video blindness after 20 to 40 minutes of active monitoring. In the last two decades, researchers and professionals of the industry have devoted their studies to develop surveillance systems that discover suspicious actions (Zhou & Tan, 2010; Liwei et al., 2010; Kishore et al., 2012; Mandrupkar et al., 2013). Automation is required in complex situations to reduce the workload of the human operator and improve the performance. Hence, surveillance systems still require intervention, improvement, and conversion from traditional surveillance systems to intelligent and smart systems (Shah et al., 2007; Tian et al., 2008). There is no human intervention at all in IVSS. The smart surveillance system automatically triggers an alert if any suspicious action or any illegal activity occurs. Accordingly, the operator focuses his attention only on the video feed and takes the convenient action.
The goal of the proposed approach is to design a system capable of automatically detect the presence of dangerous firearms especially, guns and pistols in real-time in the CCTV images. The proposed approach uses the Convolutional Neural Network (CNN) trained to determine the presence of the guns. CNN is a DL algorithm (Abdelouahab et al., 2018). DL is a subfield of machine learning. It is a technique that educates computers to perform what humans do naturally. Recently, with the emergence and successful deployment of DL techniques in image classification, researchers have emigrated from traditional techniques to DL techniques. DL has recently enriched its high ability in detection and classification. It has the ability to detect the dominant features automatically rather not manually (Tiwari & Verma, 2015; Halima & Hosam, 2016; Tiwari & Verma, 2015; Sheen et al., 2001; Xue et al., 2002; Li et al., 2008). This is the main reason prompted us to use it in our proposed approach. Nevertheless, DL suffers from two drawbacks: first, it requires very large datasets. Second, it needs high-performance computing resources. In order to overcome these two constraints. TL through fine-tuning is employed in the proposed approach. It is the improvement of learning in a new task through the transfer of knowledge from a learned task. TL means re-utilizing the knowledge learned from one problem to another one (Torrey & Shavlik, 2009). Network weights are initialized randomly if a network training is from scratch. However, the weights are initially set to the weights of the pre-trained network if fine-tuning is used. TL technique seeks to save time and get better performance. Figure 1 explains how TL improves the training performance rate.