Secure Gait Recognition-Based Smart Surveillance Systems Against Universal Adversarial Attacks

Currently, the internet of everything (IoE) enabled smart surveillance systems are widely used in various fields to prevent various forms of abnormal behaviors. The authors assess the vulnerability of surveillance systems based on human gait and suggest a defense strategy to secure them. Human gait recognition is a promising biometric technology, but one significantly hindered because of universal adversarial perturbation (UAP) that may trigger system failure. More specifically, in this research study, the authors emphasize on sample convolutional neural network (CNN) model design for gait recognition and assess its susceptibility to UAPs. The authors compute the perturbation as non-targeted UAPs, which trigger a model failure and lead to an inaccurate label to the input sample of a given subject. The findings show that a smart surveillance system based on human gait analysis is susceptible to UAPs, even if the norm of the generated noise is substantially less than the average norm of the images. Later, in the next stage, the authors illustrate a defense mechanism to design a secure surveillance system based on human gait


INTRoDUCTIoN
The Internet of Everything (IoE) is a phrase in information technology that evolved from the internet of things (IoT) as time has progressed (Kubba & Hoomod, 2019). IoE links numerous items and things over the internet using embedded sensors to gather and analyze data in an intelligent manner contexts may find it easier to implement UAP-based adversarial attacks. The presence of these adversaries raises concerns about the robustness, generalization, as well as reliability of CNNs and puts all deep-learning-assisted applications at potential risk, and reduces their use in safety-critical domains (Matyasko & Chau, 2018). Therefore, it is very essential to validate the vulnerability of the CNN-based gait model against adversarial attacks especially attacks based on UAPs as this CNNbased gait model is ultimately deployed in surveillance systems. Furthermore, tactics for defending against hostile attacks (i.e., adversarial defense (McAuley & Leskovec, 2012)) are also necessary. In particular, susceptibility is a massive concern in security surveillance systems based on biometrics. Furthermore, of all biometrics, gait recognition is the most advanced biometric since it works with low-resolution images and does not require subject cooperation. Due to these characteristics, it is commonly used for surveillance purposes. In this research, we emphasize the CNN model, which is an Figure 1. When Universal perturbation is added to clean GEI image causes the CNNs model to perform misclassification on perturbed GEI with high probability. Left Images: Original GEI images. Central Image: Universal Perturbation. Right Images: Perturbed GEI images. Arrows: On each arrow, the labels of both clean and perturbed images are written exemplary model for identifying persons based on their gait patterns and we intend to assess CNNs' susceptibility to adversarial attack as a primary objective. In addition, some research studies have exploited the vulnerability of the gait recognition model (Engoor, Selvaraju, Christopher, Guruvayur Suryanarayanan, & Ranganathan, 2020;Prabhu & Whaley, 2017), however, the gait representation used in their work is based on either silhouettes or accelerometric data. In contrary to these studies, this research study employs a more compact representation of gait namely GEI to investigate the vulnerability of CNN as a secondary objective. Furthermore, the adversaries computed in these existing studies are based on the fast gradient sign method (FGSM) and temporal sparse attacks. However, in this study, a more sophisticated perturbation using UAPs is computed to determine the vulnerability of the gait recognition model. It also shows that up to which extent the model is able to generate accurate results if the single perturbation which is computed only once is used to perturb the GEI images of different subjects. Moreover, this research study also suggests a defense mechanism using adversarial training to increase the robustness of the CNN-based gait recognition model against UAPs. The suggested defense protocol reveals that a secure surveillance system based on human gait has been designed. Furthermore, by taking adversarial defense into account; specifically, we assess how much the resilience of the gait recognition algorithm against UAPs improves with adversarial retraining (Carlini & Wagner, 2017a;Moosavi-Dezfooli, Fawzi, Fawzi, Frossard, & Soatto, 2017) (i.e., fine-tuning using antagonistic GEI images). Following are our contributions: • To the best of our knowledge, we are the first to exploit the vulnerability of the CNN-based gait recognition model with GEI images against universal adversarial perturbations • Universal adversarial perturbation is intended to successfully mislead the model, and an adversarial defense mechanism is suggested to increase model robustness • This study shows a critical flaw in adversarial robustness research on CNN-based gait recognition that has been addressed using adversarial training as a defense mechanism The rest of the paper is organized as: Section-II describes the related work in this field, Section-III presents the proposed methodology, Section-IV reports and explains different results and experiments while the last section includes the conclusion followed by references.

ReLATeD woRK
In this section, we addressed some literature on various types of attacks, followed by existing work on adversarial attacks with gait recognition and in different domains as well as discussing defense mechanisms for these attacks.
Recently, numerous researchers proposed several kinds of adversarial attacks. Further, some kinds of these attacks enable the threats and security vulnerabilities in environments of federated learning (Mothukuri et al., 2021). The term adversarial sample is first introduced by Szegedy et al. (Szegedy et al., 2013) in 2014. These adversarial samples are searched with the help of optimization problems and achieve very good performance on the state-of-the-art deep neural networks, but the computation of these adversarial examples is very expensive. Later on, the extensions of this adversarial attack are introduced namely the Fast gradient sign method (FGSM) proposed by Good fellow et.al. (Goodfellow et al., 2014). Some other variants of FGSM are proposed by Kurakin et al.(Kurakin, Goodfellow, & Bengio, 2016) named as one-step Target Class Method, Basic Iterative Method (BIM), and Iterative Least-Likely Class Method (ILCM). All these methods are classified under the group of gradientbased methods, however, there also exist some other methods such as Papernot et al.  designed another method of generating adversarial examples called Jacobian Saliency Map Attack (JSMA).
The adversarial examples are generated by obtaining the saliency maps with the help of computing the forward Jacobian matrix of the model. Carlini and Wagner (Carlini & Wagner, 2017b) further extend the idea of Papernot et al.  and propose another method of generating adversarial examples. Their method defeats the defense method of defensive distillation against these attacks. To achieve the transferability of adversarial examples across different architectures of deep CNN, Liu et al. (Y. Liu, Chen, Liu, & Song, 2016) designed the Model-based Ensembling Attack. This research shows that the transferability of targeted adversarial examples is difficult to achieve. Furthermore, attack based on optimization approaches is also proposed by Su et al. (Su, Vargas, & Sakurai, 2019) as well as Chen et al. (P.-Y. Chen, Zhang, Sharma, Yi, & Hsieh, 2017).
In addition, these adversarial attacks have been used to investigate the vulnerabilities associated with several state-of-the-art deep learning systems. For instance, in the work of Zhu et al. (Zhu, Lu, & Chiang, 2019), it is observed that applying different effects of makeup to images of faces in form of perturbation can fool the face recognition model. Furthermore, different IT companies like Google, Tesla, and Uber, etc. are using deep learning-based methods in their projects such as self-driving cars. Hence, to fool a real such system, Nir et al. utilize perturbations in the form of traffic signs (Morgulis, Kreines, Mendelowitz, & Weisglass, 2019). Further, Simen et al. (Thys, Van Ranst, & Goedemé, 2019) design adversarial patches to fool automated surveillance cameras. In their work, perturbation in form of a patch is capable of successfully hiding people from a person on the detector. Furthermore, some researchers have also exploited the vulnerabilities of deep CNN models used in medical imaging domains. For instance, Kotia et al. (Kotia, Kotwal, & Bharti, 2019) determine the robustness of the CNN model designed for brain tumor detection based on input MRI images. In addition, some researchers have also exploited the vulnerabilities of these CNN models in the domain of Natural language processing (NLP). Zhou et al. (Zhou, Guan, Bhat, & Hsu, 2019) demonstrate the vulnerability of the NLP model designed to perform categorization among real and fake news.
Furthermore, specifically to the problem of gait recognition, there exists very little amount of research done to investigate the vulnerability of deep learning-based gait recognition systems. For instance, a temporal sparse adversarial attack is designed by He et al. (He, Wang, Dong, & Tan, 2020) to fool a sequence-based gait recognition system. It is indicated in their research that sequence-based gait recognition is highly vulnerable to adversarial attack. Similarly, Prabhu et al. (Prabhu & Whaley, 2017) exploit the vulnerability of a gait recognition system employing one-dimensional CNN by perturbing gait patterns acquires from the accelerometer. The perturbation is computed using the FGSM method in their study and shows significant results in lowering the accuracy of the underlying model. However, in this research study, we have attempted to determine the vulnerability of gait recognition systems using more practical perturbations called UAPs. A generic representation of computing UAPs is shown in Figure 2. . Further, this study is different in such aspect that we attempted to exploit the vulnerability of gait recognition system in which human gait features are expressed in GEI images which are more compact representations of gait than sequences-based representations such as silhouettes. To investigate the vulnerability of such a deep learning model which is trained on more compact gait representation patterns is a major research question in this study.
Moreover, to lessen the impact of adversarial attacks in various systems several countermeasures are proposed by various researchers (Yuan, He, Zhu, & Li, 2019). Defensive distillation is one of the commonly used measures to prevent the system from these adversarial attacks (Papernot, McDaniel, Wu, Jha, & Swami, 2016). Similarly, adversarial training is another popular method in this regard in which intelligent deep learning models are trained with adversarial samples (Huang, Xu, Schuurmans, & Szepesvári, 2015).
These defense methods act as safety mechanism against adversarial attacks. However, many research studies investigated that these defense mechanisms failed to identify adversarial samples when some minor changes are launched in existing original attacks (Carlini & Wagner, 2016, 2017a). In the context of different automated systems, several researchers propose a defense mechanism e-g sun et al. (Q. Sun, Rao, Yao, Yu, & Hu, 2020) proposed novel defense method against adversarial attacks of the driving systems. Siddiqui et al. (Siddiqui & Boukerche, 2020) design a lightweight defense mechanism namely Symmetric Image-Half Flip and Replace (SIHFR) against patch-based adversarial attacks for automated surveillance systems. Moreover, in the domain of IoT, Ahmed et al. propose a generative ensemble learning for the detection of malware (Ahmed, Lin, & Srivastava, 2021a). Their proposed ensemble model develops a collaborative categorization outcome that is resistant to adversarial attack. A simple and dependable authentication protocol is proposed by Wang et al. to secure the data exchange on cloud servers using wireless medical sensors-based networks (W. . Their proposed protocol is based on the block chain and PUF technology. Furthermore, Ahmed et al. (Ahmed, Lin, & Srivastava, 2021b) propose a defense mechanism using deep reinforcement learning to secure the important information exchange over Vehicle Adhoc Networks. All of these mentioned studies point to the optimum protection mechanism for attacks in various domains. Hence for this problem, we suggest adversarial retraining as a defense strategy to improve the robustness of the gait recognition model. Some more recent research on executing adversarial attacks includes the work of Furkan et al. (Mumcu, Doshi, & Yilmaz, 2022) in which they design an attack for a video anomaly detection model. Wang et al.(Y. Wang et al., 2021) proposed a physical-world-based adversarial patch to fool the object detection model. The object detection model they used includes YoLoV2 and YoloV3 respectively. On the other hand, Siddiqui et al. (Siddiqui & Boukerche, 2021) design the defense mechanism for these patch-based adversarial attacks against the vehicle make and model recognition-based systems. Sun et al.(Y. Sun & Wang, 2022) design the presentation attack for Palmprint recognition-based biometric systems. Their findings indicate the high success rate in fooling the systems. Hemant et al. (Rathore, Sahay, Nikam, & Sewak, 2021) designed the Q-learning-based defense mechanism for the recognition of malware in android. For the deep learning model of diabetic retinopathy, Lal et al. (Lal et al., 2021) design the adversarial attack based on adversarial training. The resulting perturbations are added in retinal fundus images to fool the model. Likewise, Thomas et al. (Hickling, Aouf, & Spencer, 2022) designed the explainable deep reinforcement learning-based defense mechanism for the identification of adversarial attacks.

MeTHoDoLoGy
As previously stated, deep learning-based gait recognition outperforms in subject identification and may be widely deployed in surveillance systems. In this research, we exploit the vulnerability of CNN proposed by M. Bukhari et al. (Bukhari et al., 2020) for gait recognition and fool it with potential adversarial examples. The main overview of the proposed methodology includes several steps. In the first steps, we first train the designed CNN model on the train set. This train set is comprised of gait data which is first preprocessed before being given as an input. Afterward, we generate the universal perturbations vector and craft the adversarial images by adding that perturbations vector to test set images. Then we load the CNN trained model to determine the class labels of the test set images. The step by step explanation is given below:

Preprocessing of Gait Data
Before training the deep learning model, we preprocessed the gait data to certain gait representations. The most popular gait representation is Gait Energy Image (GEI). The following equation (1) demonstrates how the GEI images are calculated that are given as an input to CNN: In the above equation, T is representing total silhouettes extracted from video sequences of all persons by background subtraction with x and y coordinates where t denotes the silhouette number. More precisely, all silhouette images are first summed followed by dividing the total number of silhouettes. This is done for every subject in the dataset.
The resulting GEI images are less influenced by the noise factor such as in silhouettes and represent and carry more compact gait features of individuals for their identification purposes. The information about an individual's motion is displayed in dynamic areas (low-intensity areas) of GEI, whereas fixed intensity areas, also known as static areas, reveal information about the body's structure.

CNN Architecture
The CNN architecture proposed by M. Bukhari et al. shows very remarkable performance in classifying the individuals (Bukhari et al., 2020). This CNN consists of a total of ten layers and it is trained on 240 240 1 × × GEI images. The architecture is divided into four distinct blocks. In each block, there is a convolution layer of kernel size 3 3 × . Afterward, a max-pool layer of window size 2 2 × is added to downscale the GEI image. The activation used after every convolutional layer is Leaky ReLu with the value of α = 0 05 . . At each block, the different number of filters are used which are 16 32 64 , , and 124 . In addition, the starting weights of the kernel matrix are initialized with the Xavier method of initialization. After the last max-pool layer in the last block, a fully connected layer is deployed in which the number of neurons is equal to the classes provided in the dataset. The hyper parameters for this CNN include epochs which are set to 30, and 0.0001 is the learning rate of the model with weight optimizer Adam and the batch size of inputs during training is 4. Subsequently, in the second stage, we proposed variant of universal perturbation which are explained below in detail. The pictorial representation of CNN architecture is shown in Figure 3.

Universal Adversarial Attack
Since Moosavi-Dezfooli et al. (Moosavi-Dezfooli, Fawzi, & Frossard, 2016) identified the UAPs for image classification tasks, their significance has been shown in several fields. For non-targeted attacks, the UAPs are computed using simple and elegant iterative algorithms whose specifics are given in (Moosavi-Dezfooli, Fawzi, Fawzi, Frossard, et al., 2017). In this study, we have employed non-targeted universal perturbations available in the Adversarial Robustness 360 Toolbox (ART) (Nicolae et al., 2018). In non-targeted UAPs, the major objective is to find such a UAP perturbation that, when used to perturb a GEI image, may lead the model to predict any arbitrary class rather than the actual class. The algorithm takes into account a classifier C x ( ) that yields the class or label ID of a subject along with the best confidence score when the GEI image x is provided as input. At the initial stage of the algorithm, the UAP perturbation ρ = 0 indicates no perturbation, and after some iterations this perturbation is gradually changed and updated under the limit i-e the L p norm of this perturbation is comparable to or less than a minimal ξ value as given by equation (2): In the above equation (2), ρ denotes the perturbation while ρ p denotes the norm of a perturbation. Further, this process iteratively builds the adversarial perturbations for the GEI image x provided at input, which is purposively chosen from collections of GEI images of all subjects. These repeated adjustments proceed till the total number of iterations is reached i-e i max . Moreover, for each GEI image, we have employed the fast gradient sign method (FGSM) (Goodfellow et al., 2014) method to compute the universal perturbations rather than the traditional UAP algorithm which employs the DeepFool technique Moosavi-Dezfooli et al., 2016). The reason for choosing this method is that its computational complexity is much lower than DeepFool. Moreover, the FGSM method computes the adversarial perturbation ρ for GEI image x by taking the gradient ∇ ( ) , of cost function also called loss function at the GEI image x and subject label y with regard to pixel values of the image. For the norm i-e L ∞ the non-targeted perturbation that induces misclassification is calculated by equation (3): In the above equation (3), the value of ∈> 0 indicates the power of an attack or denotes magnitude of the perturbation. The term ∇ ( ) ( ) x L x C x , denotes the gradient ∇ of the cost function or loss function with respect to GEI image x and the actual label of that image provided by the classifier C . More specifically, for both norms called L 1 and L 2 norms, the adversarial perturbation is calculated using equations (4): In the above scenario, the FGSM method is carried out on the outcome C x + ( ) ρ of the CNN model or classifier for the perturbed GEI x + ρ , at every step of iteration. For non-targeted adversarial a t t a ck s , t h e p e r t u r b a t i o n ρ fo r x + ρ i s ge n e r a t e d by e m p l oy i n g F G S M i f . After computing the adversarial sample, i-e x x adv ← + + ρ ρ at this particular step, the perturbation is to ρ ξ p ≤ . We also created random vectors (random UAPs) chosen evenly from a sphere of a predetermined radius to evaluate the results of the created UAPs with those of random samples .

evaluation Criteria
To assess the vulnerability of the gait recognition model towards UAPs, we employed the fooling rate, R f for non-targeted attacks. The fooling rate R f is defined as the fraction of GEI images of different subjects in either train or test sets that have not been correctly classified. In addition, we have plotted the confusion matrices for each of the experiments to examine the variation in prediction due to the UAPs.

Adversarial Retraining
It is observed from the experiments that CNN based gait recognition model is vulnerable to adversarial attack. Hence, in order to enhance the robustness of the model, we carried out the adversarial re-training of the gait recognition model . More precisely, by the use of adversarial GEI images, we have fine-tuned the gait recognition model according to the approach described in   (Carlini & Wagner, 2017a). The major steps of adversarial re-training include that we first computed the different sets of UAPs (i-e 10) using the training set of databases. Later on, we update the original training set that is employed initially to train the CNN by randomly picking half of the training clean GEI images and merging them together with adversarial GEI images. However, each adversarial GEI image is computed using UAP which is randomly chosen from ten generated UAPs. Subsequently, the model has trained again (fine-tuning) on this modified train set, by executing 10 epochs extra. Afterward, we have computed the UAPs again using a train set against the new trained CNN model to validate the vulnerability. The mechanism of adversarial defense is shown in Figure 4.

eXPeRIMeNTS AND DISCUSSIoNS
In this section IV, we will go over the experiments and our findings from all of the algorithms. This section goes into great detail about the vulnerability of gait recognition models. All of the experiments are conducted over the Google Colab with implementation language python.

Dataset
The dataset used for experimentation purposes is the CASIA gait dataset which is provided by the Chinese Academy of Sciences. There are three different parts of the dataset namely CASIA-A gait dataset, CASIA-B gait dataset, and CASIA-C. Here we use the CASIA-B gait dataset which is the largest multi-view gait dataset. In this dataset, ten sequences are available for each subject out of which six are those sequences in which subjects are waking in an indoor environment with a normal walk style. The other two sequences of each subject are available in which they are walking with bags. Similarly, two sequences of each subject are available in which they wear different types of coats. The normal sequences are defined with the notation "nm" while bag sequences are defined with "bg" and coat sequences are defined with the notation "cl". Furthermore, data from 124 different individuals is available in a dataset, which is divided into gallery and probe set in each experiment. In this particular research study, we have employed the normal walking sequences of each subject in the database.

Performance of Baseline Model
To exploit the tolerance of the CNN-based gait recognition model deployed in the surveillance system towards the attack, we first train the designed CNN model on normal walking sequences of different subject's i-e 124. At the first stage, the entire data set of 124 individuals is partitioned into train and test sets, with each person having six normal walking sequences. The train set contains the sequences [nm-01 to nm-04], whereas the test set contains the sequences [nm-05 to nm-06]. The experiment is repeated five times and the average test accuracies of the CNN model is 97.61% respectively. The results are shown in Table 1. In addition, the average confidence scores of test sets are also listed. It has been observed that the CNN model works extremely well at distinguishing individuals based on their gait patterns and has a high degree of certainty.

Vulnerability of Model with Universal Adversarial Attack
The CNN-based deep learning model shows better accuracies in recognizing different persons; however, it is observed that under UAPs, the model performs poorly and is hence deemed vulnerable as shown in Table 2. To compute the UAPs, we employed entire train set images of all 124 subjects present in the database. The parameters of UAP attack includes the noise computation method which is set to FGSM, and the attack is conducted in un-targeted manner hence the parameter of attack type

. An adversarial defense to increase the robustness of gait recognition model
is set to un-targeted. Further, the algorithm runs for 15 iterations with norms L 2 and L ∞ with desired accuracy parameter set to 0.000001. After computing the UAPs, they are added to both train and test set to compute values of fooling rates R f . This measure indicates the percentage of images that are incorrectly classified. More precisely, on the test data the fooling rate R f with ξ = 8 for UAPs using L 2 is about 47%. A greater ξ resulted in increased R f . It is also indicated that the R f of the UAPs is about 72% on test set walking sequences for the ξ = 10. Similarly, for random UAPs the value of R f on train and test is about 6% and 8% respectively. This shows that random UAPs have no substantial effect on the accuracy of the model as compared to universal UAPs. Similarly, with norm L ∞ the R f is about 30% and 74% on the test set with ξ = 0 06 . and ξ = 0 08 . . In addition, we choose the value of ξ in such a way that the L 2 and L ∞ norm of the resultant UAP does not increase with mean L 2 and L ∞ norm of images in the train set. The is only a little bit of difference among the values of fooling rates R f on test set walking sequences for both types of norms, while maintaining the same parameters setting for both norms. In the case of fooling rate R f with random UAPs there is no significant difference was found in L 2 and L ∞ norms-based perturbations. In addition to the above, we have plotted the confusion matrices after an adversarial attack. The database contains the data of 124 subjects, however, due to space issues we have plotted the confusion matrices for ten persons. The test set contains the two instances of normal walking sequences for each person comprising 248 GEI samples. Since the GEI is computed for each video sequence. Figure 5(left) shows the confusion matrix of ten person for the test set whose samples are perturbed with UAPs with the L ∞ norm. It is observed that GEI images of different persons are wrongly classified. For instance, both two instances of person Id, 3,4,5,7, and 8 are wrongly classified to some arbitrary classes. Similarly, Figure 5 (right) shows the confusion matrix of ten person for the test set whose samples are perturbed with UAPs under norm L 2 . It has been noticed that GEI images of various individuals are incorrectly labeled. For instance, both test samples of person 1, 2,4,5,6, and 9 are incorrectly classified by the model. Moreover, Figure 6 (a) and (b) shows the resulting perturbations with different norms and their corresponding adversarial images computed using these perturbations. In Figure 6(b) Row 1 corresponds to adversarial images of person ID-01 while rows 2 shows the adversarial images of person ID-002. It is observed from Figure 6 that the resulting images are more seems similar to the original images. The contextual and shape features of a person present in the image are not disrupted. Hence, it is concluded that the underlying CNN model is vulnerable even if the perturbations are less noticeable. Moreover, it is also observed that by increasing the values of ξ the magnitude of noise becomes stronger and hence visible in the images. But on the other hand, if the value of ξ increases the fooling rate also increases. Furthermore, it is required to convey how confident the CNN model is in taking wrong decisions i-e predicting the subject's label by presenting an adversarial GEI image. The summary of confidence scores over complete adversarial test set images is shown in Figure 7. The first two box-plot-based curves in Figure 7 show the trend of confidence scores over the complete test set images using both norms. It is observed that the model is about 60-85% confident while making wrong predictions. More precisely, the y-axis of the plot in Figure 7 indicates the confidence scores and as shown in Figure that area of the first two box-plots are lies in the range 60-85% which means that on most of the test samples the confidence scores of the model are in the range of 60-85%.

Impact of Adversarial Training to Mitigate the Adversarial Attack
To counteract the effect of the adversarial attacks, the adversarial training, adversarial training is frequently utilized approach. In this research study, we first investigated the vulnerability of the gait recognition model, and at a later stage, we examined that at how much the defense mechanism of adversarial re-training increases the resilience of the gait recognition model, against the UAP attack. This defense mechanism of adversarial training did not have any impact on the test set, especially, the performance accuracy on clean GEI images held steady around 97.98%. We have performed the adversarial training against UAPs computed using different types of norms. For adversarial images computed using non-targeted UAPs computed using norm L 2 with ξ = 10 , it is observed that, fooling rate R f is decreased progressively. This experiment is conducted with the data of all 124 subjects but confusion matrices with 10 subjects are depicted in Figure 9, which indicates that the model is now performing correct predictions even if samples are perturbed with UAPs. Hence, it is logically reasonable that adversarial training-based defense mechanisms assist to increase the robustness of the model.
Furthermore, if we test the robustness of fine-tuned in terms of its confidence over adversarial images, we can see from Figure 7 that now the model properly classifies the adversarial images with a rate of approximately 90% reaches up to 2% on test set images after several epochs of adversarial re-training as shown in Figure 8. More precisely, the x-axis in Figure 8 shows the epochs while the blue curves indicates the accuracy on test set and orange curves indicates the decrease in fooling rates

Figure 6. UAPs against gait recognition model and their corresponding adversarial images for two different persons with different norms and magnitudes
over several epochs. Furthermore, with the norm L ∞ the fooling rates R f is also decreased up to 2% over several epochs of adversarial re-training. After adversarial training, we again computed the UAPs to evaluate the robustness of the model which is fine-tuned over the modified train set. The results of the fine-tuned model against both universal and random UAPs are shown in Table 3. It is observed that R f values are very low which shows significant robustness of the model against UAPs.

Discussions
It is clear from the analysis of the above results that although CNN-based gait recognition shows impressive results in the classification of persons, but if we look at the opposite side of CNN's then there exist security risks against these models. CNNs performance drops if the input samples are perturbed with minimal noise. In addition, when the model is deployed in real-world (and possibly hostile) situations, the presence of these perturbations can be used by adversaries to break the model. Moreover, adversaries can cause CNN-based solutions to underperform at a reduced cost (i.e., with a singular perturbation); especially, while targeting CNN's employing UAPs, they don't have to assess the distribution and variability of input GEI images, because UPAs are image independent. Given the fact that the vulnerability of these CNN against UAPs has been exploited in many use-cases, hence, it is hypothesized that they will exist uniformly in CNN-based models are designed for person identification through gait . In addition, it is also observed in various experiments that when UAPs are added to clean GEI images and given as an input to the model then the model performs incorrect classification over those perturbed samples with some specific arbitrary classes. This finding is in accordance with CNN models' inclination to categorize most input data into some distinct categories due to non-targeted UAPs, -for example, the presence of dominating categories in non-targeted UAP-based attacks. Since the method emphasizes maximizing the fooling rate, a rather high fooling rate is obtained when all GEI images are categorized into some arbitrary specific classes. Hence, it is logical to deduce that security and surveillance based on human gait analysis are at potential risk due to the existence of these perturbations (Rudin,   . Furthermore, our first contribution is to demonstrate the vulnerability of the CNN-based gait recognition model by using a more compact representation of gait features that is GEI. This representation carries more informative features of gait and hence more strongly assists to identify a person based on their gait style. Hence to fool and determine the vulnerability of such a model which is trained on more accurate features of gait is a major research question. The findings shown above show that even when a model is trained using GEI images, it is subject to adversarial attack. Similarly, our second contribution is that we have designed the UAPs based adversarial attack to demonstrate the vulnerability gait recognition model. As in the original method, the perturbation is computed using the deep fool method, however, in the proposed study, we have utilized the fast gradient sign method to compute this perturbation. This is due to the reason that the deep fool method is computationally expensive as it performs successive iterations to compute the perturbation. On the other hand, the FGSM method is less costly and computes the perturbation in one step. This proposed new variant of attack based on FGSM based universal perturbations is less costly and good enough to demonstrate the vulnerability of the gait recognition model. Further, this variant namely universal adversarial attack based on FGSM based perturbations can also be extended to be applied in other domains to demonstrate the vulnerabilities of deep learning models such as in the domain of medical imaging. Further, the universal perturbations are more practical and can significantly play a role in security risks in these systems. Hence, it is necessary to first check the vulnerability of the model so that the weakness of these models is highlighted and overcome. Hence, the proposed study utilizes the more advanced version of adversarial attacks namely universal perturbation and modifies the actual algorithm by replacing the noise computation method with the less costly method. From the perspective of the application, this research study contributes to demonstrate the vulnerabilities of most evolving biometric technology namely gait recognition which can be used in video surveillance systems.
After highlighting the vulnerabilities of the gait recognition model, we have also presented a mechanism to secure the model to increase its robustness. For this purpose, we have performed fine-tuning of a model for ten additional epochs with adversarial GEI images computed using UAPs. The resulting fine-tuned model is more robust and accurate against the UAPs as it strongly mitigates the impact of the adversarial attack. Thus, it is concluded from this research study that a secure gait recognition model can be deployed safely in biometric-based video surveillance systems if the learning of the model is improved with certain defense mechanisms. An adversarial training proved to be useful to secure the gait recognition model against the UAPs. The major flaw in adversarial robustness research of deep learning-based gait assisted video surveillance systems is demonstrated and as a motivation, we also suggest a defense mechanism. Moreover, the study also develops a motivation for different researchers that strict adherence is necessary for actual applications of CNNs to gait recognition, particularly ways to overcome known vulnerabilities.
Advanced computer vision algorithms, like CNNs, are already employed for high-stakes intelligent decisions in security and surveillance nonetheless, they have the ability to offer devastating damage to security systems since they are frequently complicated to understand. In addition, the UAPs based attacks are white-box attacks which means that attackers have accessibility to parameters of the model i-e in this context the attacker has accessibility to the gradient of cost function as well as a training set, consequently, they pose a potential risk for open source software's e-g person identification through gait. Hence, to prevent these systems from adversarial attacks, a very basic solution is to make them closed source and inaccessible to the public. In addition, another way is to think of systems that are black-box i-e closed application programming interfaces (APIs) that allow only input queries and provide outputs. These closed APIs are preferable since they are less accessible to the public. APIs, on the other hand, may be susceptible to adversarial attacks. The reason behind this is that UAPs are generalized perturbations, and perturbation computed using one CNN can able to fool another CNN model. Hence, it is possible to compute UAPs as a white-box attack, to fool the black-box-based CNN system. Moreover, there exist many approaches to conduct black-box adversarial attacks, in which perturbations are computed using only the outcomes of the model such as confidence scores (J. Chen, Su, Shen, Xiong, & Zheng, 2019;Co, Muñoz-González, de Maupeou, & Lupu, 2019;C. Guo, Gardner, You, Wilson, & Weinberger, 2019). As a result, defensive tactics for adversarial attacks should be established. Fine-tuning of CNN models on adversarial images is one of the straightforward defensive methods. Indeed, we have analyzed that fine-tuning of gait recognition model on 10 extra epochs using UAPs increased the robustness of the gait model to adversarial attack using UAPs. On the other hand, in some cases, this repetitive strategy of fine-tuning has large computational complexity, and it did not accurately prevent susceptibility to UAPs. Furthermore, different research studies have been suggested for breaching the defense mechanism of adversarial retraining (Carlini & Wagner, 2017a). Principal component analysis (PCA) based dimensionality reduction, distributional, and normalization recognition might be helpful for a defense mechanism, nevertheless, it is very difficult to detect adversarial samples using these approaches (Carlini & Wagner, 2017a). Preventing different systems against adversarial attacks is a game of cat-and-mouse (Finlayson, Chung, Kohane, & Beam, 2018), therefore, it might be challenging to completely reduce the potential risks deduced by these adversarial attacks. On the other hand, the techniques to prevent these attacks have been improved. For instance, at densely distributed input samples, recognizing adversarial attack-based resilience to random noise (Yu, Hu, Guo, Chao, & Weinberger, 2019), a discontinuous activation function is employed that intentionally negates the gradients of the CNN (Xiao, Zhong, & Zheng, 2019) and CNN's for cleaning data samples could help mitigate some of the considerations (Hwang, Park, Jang, Yoon, & Cho, 2019) In the existing literature, the vulnerability of different systems has been investigated using different types of attacks. All these systems are designed using deep learning-based methods. Table  4 and Table 5 provides a basic comparison of the vulnerability of various systems employing various adversarial attacks. For instance, in the domain of healthcare application, Cheng et al. (Cheng & Ji, 2020) exploit the vulnerability of the CNN model which performs tumor detection using brain MRIs. They have also employed the universal adversarial perturbations to create adversarial MRIs to fool CNN. Similarly, in recommender systems, Tommaso et al. (Di Noia, Malitesta, & Merra, 2020) employ targeted adversarial attack to fool it. In this attack, the behavior of the recommender model is disrupted to recommend the least related items to users. For face recognition, Dong et al. (Dong et al., 2019) exploit the vulnerability using a decision-based black box attack. The perturbations in their attack are designed using only outputs of the model by querying different inputs without accessing the information of model gradients. Similarly, in sequence data, Fazle et al. (Karim, Majumdar, & Darabi, 2020) employ the adversarial transformation networks to generate the adversaries to fool deep learning assisted time series classification model. Zhang et al. (Zhang, Zhou, & Li, 2020) employ contextual adversarial attack to fool the object detection model. Their suggested approach can disrupt the image's contextual features and severely lower the mean average precision (mAP) and recall scores. Moreover, in the context of human gait recognition, which is used as a surveillance system, there exist some research studies that have exploited the vulnerability of gait recognition. For instance, He et al. (He et al., 2020) suggest the temporal-sparse adversarial attacks for sequence-based gait recognition. In their attack, the perturbation is added to silhouettes images of different subjects. These silhouettes images are part of a complete long sequence/video of the subject. The suggested shows good performance to determine the vulnerability of sequence-based gait recognition models. Generally, the silhouettes representation of gait carries less informative features of gait than GEI images. Furthermore, Parbu et al. (Prabhu & Whaley, 2017) employ the FGSM method to disrupt the gait features obtained through accelerometer and have attained very good performance. In comparison with these studies, this study employs a more compact representation of gait namely GEI, and attempt to exploit the vulnerability of the CNN model. The adversaries are computed using a universal adversarial attack. In addition, we also suggest a defense mechanism to increase the robustness of the gait recognition model deployed in IoE enabled smart surveillance systems under adversarial attack.

Theoretical and practical contributions
Due to the obvious advantages of gait-biometric, gait-based surveillance is most widely utilized presently. The first is that it does not need the subject to collaborate throughout the identification process. Second, low-resolution cameras can readily evaluate human gait. Several researchers have developed gait recognition algorithms due to their impressive features. Out of all of them, gait recognition utilizing deep learning performs the best. However, how far this higher performance is not tested under a more realistic attack, i.e. "Universal adversarial attack". Therefore, in this research study, the major contribution is to exploit the vulnerability of the gait recognition system based on the deep learning method against a realistic adversarial attack. We generate the perturbations with an adversarial attack and then add them to GEI images. The generated adversarial images are then sent into the deep learning model, to estimate how well we fooled the model. It is observed from the results that the gait-recognition model becomes a fool when it is subjected to an adversarial attack. Secondly, in existing studies they have conducted adversarial attacks by perturbing the gait features present in silhouettes or accelerometer-based features, however, in contrast to them, we have used more effective representation for gait i-e GEI to indicate how well we fool the model that is trained on GEI images. Furthermore, as a solution, we also proposed a defense mechanism based on adversarial training. It is observed that fine-tuning the model on adversarial images can save the model from being fooled again. We have practically proved the vulnerability of the gait-recognition system by first designing the effective gait recognition model and later on in the next stage we design the adversarial attack followed by designing the defense mechanism as a solution to protect it against adversarial attacks.

CoNCLUSIoN
IoE has the potential to improve our daily lives by evolving various biometric-based surveillance systems towards becoming more of a current process in daily lifestyles. However, the vulnerability of these surveillance systems must be exploited and accordingly defense mechanisms should be developed before they can be used in operation. In this paper, we illustrated the vulnerability of the CNN-based gait recognition model used for surveillance purposes to non-targeted UAP-based attacks. This vulnerability has been demonstrated using a more compact representation of gait namely GEI image. Straightforward implementations of CNNs to gait recognition potentially cause issues in security threats at different domains and hence defense is also required to secure the systems. Therefore, we have also suggested the defense protocol to design a secure gait-based smart surveillance system by performing adversarial retraining of the model. Moreover, this work motivates different researchers to think about all security risks associated with gait recognition biometric systems used for automated surveillance and encourages them to design more powerful defense strategies in their systems to make them robust to adversarial attacks before they are practically deployed. This research's future work will entail the development of black-box attacks against gait recognition systems along with defense mechanisms.

CoNFLICT oF INTeReST
The authors of this publication declare there is no conflict of interest.