ResNet and PCA-Based Deep Learning Scheme for Efficient Face Recognition

Face recognition is an emerging field of research in recent days. With the rise of deep learning, face recognition has become efficient and precise, creating new milestones. The performance, accuracy, and computational time of the existing schemes can be enhanced by devising a new scheme. In this context, a multiclass classification framework for face recognition using residual network (ResNet) and principal component analysis (PCA) schemes of deep learning with Dlib library is proposed in this paper. The proposed framework produces face recognition accuracy of 99.6% and a reduction of computational time with 68.03% using principal component analysis.


INTRoDUCTIoN
Face recognition is an emerging technology in Artificial Intelligence and computer vision with many research ongoing to develop fast, robust, and more accurate system which can be used in many fields of application like security systems (Othman & Aydin, 2018), video surveillance or verification like done on various online platforms like Facebook and google photos.Apart from this face recognition is being applied in various real-life applications.Various algorithms for 2D and 3D face recognition are being developed (Abate et al., 2007;Masupha et al., 2015).Face recognition can be implemented in various real-life application like security and surveillance (Jaiswal et al., 2020), criminal identification, forensic applications (Kute et al.,2019), threat detection (Sarma et al., 2017), face verification, and attendance systems (Arsenovic et al., 2017).

Motivation
The performance, accuracy, and computational time of the existing schemes can be enhanced by devising a new scheme.Nowadays, various online platforms use their trade secret algorithms for face recognition and charge for them.The motivation of this paper is to develop a robust, fast, and optimized system for face recognition with the help of various freely available tools.The system developed thus can be implemented in any existing frameworks which can utilize the objective of face recognition.

Contribution
We devised a framework using the machine learning library Dlib for face detection and landmark estimation, a Residual Network for feature vector calculation, PCA for dimensional reduction, and various multi-class classifiers for classification.The experiment is done to reduce the computational times of various classifiers after applying PCA and fine-tuning of parameters of the classifiers.The performance of various classifiers is evaluated on the validation set which contains nonoverlapping data from the training dataset.

organization
The rest of this paper is organized as follows.Section 2 comprises the related research in the field of learning-based facial recognition.The methodology of the proposed work is discussed in section 3. Section 4 describes the proposed algorithms and implementation is discussed in section 5. Section 6 deals with the experimental results and discussion thereon.Finally, section 7 provides conclusions with some future directions.

BACKGRoUND AND RELATED woRK
The research area of face recognition continues to evolve these days.It became an important part of various real-life applications.Several face recognition systems have been developed which are fast and robust.This section presents the background and a brief survey in this field.

Basic Principles in Facial Recognition
Face recognition algorithms are generally based on the feature extraction that is present in images of faces.The facial recognition system will compare the features present in the database with the features extracted from the face images.By defining some confidence value, when this confidence threshold value is crossed then the faces can be said to be similar, the result is then shown.The process of facial recognition can be divided into two categories: One-to-One comparison of image for confirmation; other is One-to-Many comparison for the sake of identification.
In a specialized way, facial recognition systems include a list of interdisciplinary technology to build a system, which includes the acquisition of face images, locating the faces, preprocessing for facial recognition like pose normalization (Al-Obaydy & Suandi, 2020), extraction of features, identity recognition, identity search, etc.In a generalized way, facial recognition is a system for identification either through search or comparison.Apart from this face recognition can be implemented into expression analysis like fatigue detection (Li et al.,2021).

Some Face Recognition Methods
Many machine learning and deep learning-based schemes have been developed for face recognition systems.However, their performance, accuracy, and computational overhead can be optimized.Some such face recognition methods are discussed in this section.Table 1 presents a comparative analysis of the existing face recognition schemes.

Based on Geometrical Features of Faces
The calculation of geometrical features from the image of a face is the basis of this method (Ouarda et al, 2014).The entire configuration may be stated as a vector that represents the location and the dimensions of the most important facial features like chin, lips, nose, eyes, mouth, and many others.
In terms of recognition speed and memory, these algorithms perform very well, but the accuracy of recognition is very poor.

Based on Eigen Faces (PCA)
The face recognition based on eigenvectors computed from face images can be done.To escalate the efficiency and to reduce the time taken in recognition, Principal Component Analysis (Kaur & Himanshi, 2015;Zafaruddin & Fadewar, 2019) is done on the eigenvectors and all the irrelevant information is then truncated.A slightly different version of kernel PCA-based dimensional reduction on eigenfaces was done by Kim et al, 2002.

Based on Laplacian Faces (PCA)
The Eigenface-based recognition method intends to preserve the broad structure of the image features, Laplacian Face (He et al, 2005) based recognition method intends to preserve the narrow structure of the image features.In many practical classification problems, the local manifold arrangement is more significant than the global Euclidean arrangement, specifically when nearest-neighbor-like classification is done.

Based on Support Vector Machine
Face image classification based on actual pixel value is also a method for face recognition.Sani et al, 2009 used the Support Vector Machine to classify the faces on the Yale database.unlike any other SVM face recognition approaches they didn't apply any preprocessing on the data such as feature extraction, dimension reduction techniques, or illumination correction method.

Based on Compact Binary Face Descriptor
Many facial recognition systems used Local Binary Patterns (LBP) like binary feature descriptors and their variants, due to their outstanding robustness and resilient discriminative power.Compact Binary Face Descriptor (CBFD) (Lu et al, 2015) is a more advanced and efficient version in Binary feature descriptor which is based on the extracted pixel difference vectors (PDV) in local patches, a mapping of feature to project these PDVs into lower-dimensional binary vectors in an unsupervised manner is called as CBFD.

Based on Convolutional Neural Network
The artificial neural network is a nonlinear dynamic structure that is known for its great selforganization and adaptive ability.Gou et al., 2019, have compared various deep learning-based face recognition algorithms.After the rise of the convolutional neural network (CNN), the efficiency and robustness of the face recognition systems got a boom (Goswami et al., 2019).With the highest rates of accuracy and precision of recognition, it proved to be the best method so far.DeepID3 (Sun et al, 2015) uses a convolutional neural network for face recognition.Parkhi et al, 2015, produced a web scraped dataset of 2.6M images of 2.6K people and trained a CNN on it.Sharma et al, 2016, developed a face recognition system using CNN and tested it on the Face Recognition Grand Challenge (FRGC) dataset.Coskun et al, 2017, developed a face recognition system using CNN and a softmax classifier which was tested on Georgia Tech Face Database.Liu et al., 2017, developed an open-source face recognition development kit named VIPLFaceNet using deep neural networks.

METHoDoLoGy oF THE PRoPoSED woRK
The principles for implementing various fragments of a face recognition system are described in this section as follows.

Detection of Faces and Landmark Estimation
Face detection is the method to detect faces in the provided image.Dlib provides mainly two built-in methods for face detection viz.Histogram of Oriented Gradient (HOG) (Cerna et al, 2013;Nigam et al, 2018;Li & Lin, 2017) and Convolutional Neural Network (CNN) (Mliki et al, 2020).However CNN-based face Detector is more advanced and provides the best results, but it is computationally very costly and cannot run in real-time CPU.So for the scope of this system, we will be using Dlib's Frontal Face Detector with HOG, the working of the detector is like first of all the provided RGB image is converted into grayscale for the sake of simplicity, the distribution of the directions of the gradients is used as features, then the features extracted by HOG is fed through a linear SVM.The Frontal Face Detector returns a 'Rectangle Dlib object' which contains information like coordinates, area, center point, etc. Figure 1 presents HOG description of an image.
After the coordinates are obtained the next step is the estimation of landmarks of the faces.Landmark estimation is a tool for facial feature point extraction.There is a pre-trained face landmarks detector provided in the Dlib toolkit which is used to approximate the locations of 68(x,y)-coordinates that map to facial structure on the face the conception of the landmarks is shown in Figure 2.These landmarks are part of the 68 points iBUG 300-W dataset which was used in the training of the Dlib facial landmark predictor.Figure 3 presents the predicted landmarks of an input image.

Face Feature Vector Computation
After the facial landmark points are acquired, they are then used to align face position by affine transformations, if not aligned, the accuracy of face recognition gets affected if it is not frontal pose.After the alignment, the face gets fed through the face recognition residual network.128-dimensional feature vectors are then produced, which is used as the base of face recognition.The Euclidian distance or similarities in Cosine of vectors of two faces is computed to check their similarity, with some predefined confidence threshold.
How the Face feature vectors are calculated?Dlib library uses Residual network (He et al., 2015;Khan et al., 2018) 128) layer, so it can generate 128-dimensional feature vector.The resultant network is then trained on about 3 million faces of 7485 individual identities.The Network model obtained is so powerful which then can be used to generate a feature vector of any human face.Figure 4 shows the workflow of the face feature vector calculation.

Face Recognition System
It is quite complex involving various steps and methods, which in summary can be stated as detecting face, locating face, pre-processing, extracting of features, training of features, and classification task.In addition to mentioned steps data dimension reduction methods is also a key step in the face recognition process.In this paper Principal Component Analysis (PCA) is used for dimension reduction.The framework used in this system is of two portions.The first portion consists of training and model generation and the second portion works on face recognition.The workflow of this framework is shown in the flowchart in Figure 5.

Machine Learning Model Training
To train a machine learning model, first of all, the training data must be imported.For each image in the training set, the face on the image is detected and located, after all the facial feature points are located and fed through the ResNet+DNN network for the generation of the 128-D feature vector, each vector is then associated with the corresponding label which is the name identity of the person.After these steps, the trained ML model is obtained.

Recognition Section
For input image or video, the first step is of face detection and location, after that, the face feature vectors are calculated for each faces present in the image or frame of the video, after this, using the trained ML model obtain the classification outcome and mark it down on the screen.

PRoPoSED ALGoRITHMS
The proposed scheme has main four algorithms as follows.Algorithm 1 is for the face detection process.Algorithm 2 describes the Landmark Estimation process.Methodology of Face Feature Vector Calculation is explained in Algorithm 3 and the procedure of Face Recognition System is discussed in Algorithm 4. Figure 6 presents the flowchart for face detection, landmark estimation, and face feature vector calculation process.Flowchart for the face recognition system is shown in Figure 7.  Input Image.

2.1
Convert image to grayscale.

4.
Feed HOG features through Linear SVM classifier.

End
Algorithm 1 performs the task of face detection which is done using HOG features of image.The input of this algorithm is an image file and the output is a Dlib 'rect' object which contains the information about corners of the bounding box, center of the face, and area of the bounding box.Algorithm 2: Landmark Estimation Input: Dlib 'Rect' object and image Output: 68 coordinate points of face landmarks Begin 1.
Input Dlib Rect object and image.

2.
Feed landmark estimator model with rectangle coordinates and image.

3.
Obtain 68 coordinates of face landmarks.End Algorithm 2 is used for the estimation of the 68 landmark points that are present in face images, the input is the Dlib 'rect' object and the image file which then passed through the pre-trained model for landmark estimation, the output of this algorithm is coordinates of all the 68 landmark points that are present in the image of the face.Algorithm 3: Face Feature Vector Calculation Input: 68 coordinate points of face landmarks and image Output: 128-D face feature vector Begin 1.
Input 68 coordinate points and image.

2.
Using 68 points align face using Affine Transformation.

End
The task of face feature vector computation is done in Algorithm 3, the 68 landmark coordinates along with the image are provided as input, using the landmark points and affine transformation the faces are aligned, the aligned face is then fed into a residual network for feature vector generation which is also the output of this algorithm.
The face recognition system is presented in Algorithm 4, which takes the image dataset with proper labels as input and gives performance results in the output.For every image present in the dataset the task of face detection, landmark estimation, and feature vector generation is performed after that the obtained 128D feature vector along with the proper label is then stored into a separate CSV file, on this newly created dataset of feature vectors and labels PCA is performed and the dataset is reduced and then it is split into training and test set, now grid search and cross-validation is performed on the training set for the sake of fine-tuning of hyper-parameters, after that the trained model is then tested on the test set.For validation purposes, another dataset of non-overlapping data from the original set is taken and the performance of the model is obtained.Input dataset.

2.
For each image in the dataset.

Dlib Toolkit
Dlib toolkit is a modern C++ library that consists of a range of various tools and algorithms of deep learning, machine learning, image processing, and face recognition.This toolkit is extensively used in automation and robotics, mobile devices, and various big high-performing computational areas.Dlib toolkit can certainly help us in the building of a complete high-performance facial recognition scheme.

Dataset Preparation
For the performing following experiment a custom dataset of 6800 face images of 657 persons of a different race, age and gender is taken.To create this dataset, some data has been taken from the LFW dataset, and the rest is scraped from the web.

Face Detection and Landmark Estimation Experiment
In this phase of the experiment the dataset is fed through face detection and landmark estimation algorithm, Figure 8 shows 6 random person's landmark estimation through the algorithm.According to the figure, the landmark estimated does not depend on the angle, pose, or position of the face.Different landmarks are essentially in the corresponding positions of the faces.
To check the accuracy of the algorithm the whole dataset of 6800 images is fed through it, among them, 6459 can be effectively extracted with feature points of faces with an accuracy of 94.98%.Here, we can conclude that if the face is not clear and visible due to small size, occultations on faces or the picture being not clear can result in the failure of the algorithm to extract the face feature points.

Face Feature Vector Calculation
All the extracted feature points are then fed through the ResNet+DNN Network to obtain the 128-D feature vector.After this the feature vector along with the label is stored in a new dataset and then Principal Component Analysis (PCA) is performed for the reduction of dimensions on the dataset.Out of 128 features, the Principal component is found around 64 features with 0.98 variances retained as shown in Figure 9.

Process of Face Recognition
It can be described as detecting faces, estimating facial landmarks, feature vector computation, and classification based on the face distance.The detailed workflow is shown in Figure 10.The classifier whether be it SVM, KNN, or LR compares the input data with the training data and makes the prediction based on the internal algorithm used.The model derived from training was then tested on the validation set.227 non-overlapping images data of 104 people at random is selected and prediction is made.

PERFoRMANCE EVALUATIoN
We have used various confusion matrix-based performance metrics viz., Accuracy (A), Precision (P), Recall (R), and F1-score (F1) to evaluate the performance of the schemes.We also compared the computational time of the schemes to measure their performance.Various performance metrics based on the confusion matrix (A, P, R, F1) are described as follows.The confusion matrix is a matrix represented in terms of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).

Accuracy (A)
Accuracy is the measurement of precisely classified data.The procedure of computing Accuracy is given in Eq. (1).

Precision (P)
Precision is the rate of true-positive prediction over total-positive prediction.It is calculated according to Eq. ( 2).Eq. ( 3) depicts the weighted average precision of the scheme.

Recall (R)
It is the rate of true-positive-predictions over the sum of true-positive-predictions and false-negativepredictions. Sometimes, it is also called Sensitivity.It is calculated as Eq. ( 4).The weighted average recall is can be calculated with the help of Eq. ( 5).

TP TP FN
* 100 Here R c1, R c2 ... are the recalls for class 1, class 2, and so on, and |c1|,|c2|,… are the number of instances in class 1, class 2, and so on.

F1 Score (F1)
It is computed with the help of Recall and Precision.The method of computing the F1 Score is shown in Eq. ( 6) and Eq. ( 7) shows the calculation of the weighted average F1 score.

Experimental Results
The performance of the classifiers does not change after applying PCA but in account of the computational cost, it performs very well.SVM or KNN either can be used as classifiers for the face recognition system.But if the demand is for lightweight with a faster classification such as in video feed or live stream KNN is the go-to algorithm, however, SVM also performs very efficiently but with the trade-off of computational time.

Performance Evaluation Based on Confusion Matrix Parameters
Table 2 presents the results of the various performance metrics used for performance analysis of the proposed and the existing schemes.The performance of the three classifiers viz., SVM, KNN, and LR on the training set is shown in Figure 11.algorithms is presented in Figure 13.The overall best performing algorithm is PCA+KNN in terms of both successful prediction and computational time combined.

Performance Evaluation Based on M-Fold Cross-Validation
The various m-fold cross-validation performed on the reduced dataset with PCA and using the various algorithms is shown in table 4. The grid search method applied on the reduced dataset for obtaining the fine-tuning of hyper-parameters of various algorithms provides, for SVM the values of cost=100, gamma=0.1 and kernel='linear' is found, for KNN the values of metric='euclidean', n_neighbors=2 and weights ='distance' is found and for LR the value of cost= 1000.The obtained models are then tested for over-fitting in which they passed.

Discussion
This section presents the experimental results of the proposed and the existing learning-based face detection schemes.It also provides a discussion on the performance of the schemes.For multiclass classification problems like face recognition, Support Vector Machine (SVM), K-Nearest Neighbour (KNN), and Logistic Regression (LR) perform well, in this experiment we will compare the result of the above-mentioned classifiers with normal and PCA reduced dataset.The obtained dataset of 6459 face encodings is now split into a training set of 4869 encodings and a testing set of 1590 encodings.

Support Vector Machine
By applying grid search on hyper-parameters cost, gamma and kernel, and m-fold cross-validation we can obtain the fine-tuned values of the hyper-parameters of the Support Vector Classifier, after this the model is then trained on the training set the accuracy of 98.9% and precision of 99.2% is obtained on the test set.The training time after dimensional reduction using PCA is reduced by 41.35% and prediction time is reduced by 5.23%.

K-Nearest Neighbour
By applying grid search on hyper-parameters metric, n_neighbors and weights, and m-fold crossvalidation we can obtain the fine-tuned values of hyper-parameters of Logistic Regressor, the model is then trained on the training set the accuracy of 98.5% and precision of 99% is obtained on the test set.The training time after dimensional reduction using PCA is reduced by 68.03% and prediction time is reduced by 64.4%.

Logistic Regression
By applying grid search on hyper-parameter cost, and m-fold cross-validation we can obtain the finetuned values of hyper-parameters of Logistic Regressor, the model is then trained on the training set the accuracy of 97.8% and precision of 98.7% is obtained on the test set.The training time after dimensional reduction using PCA is reduced by 48.46% and prediction time is reduced by 50%.

CoNCLUSIoN AND FUTURE DIRECTIoNS
This paper presents an experimental assessment of the face recognition system using Dlib library, ResNet.In this experiment, dimensionality reduction is done using PCA which provides the maximum reduction of computational time by 68.03%.The proposed algorithm is tested on a custom-created database of 6800 faces of 657 people with 3 different multiclass classifiers.The best performing classifier in terms of accuracy is the support vector machine with 99.6% accuracy on the validation dataset of 227 faces of 104 people non-overlapping with the original dataset.However, in terms of computational time, the best performing classifier is KNN with an accuracy of 99.1%.So, for the real-time application like surveillance KNN classifier works best in terms of computational time and accuracy with a slight trade-off of accuracy.In the future, this face recognition system can be implemented using GPU.The rate at which the face feature extraction and feature vector calculation are done can be drastically improved using GPU processing powers.We can also use the convolutional neural networks for the face detection process.

ACKNowLEDGMENT
This research is partially funded by the Technical Education Quality Improvement Program (TEQIP III).
followed by DNN to generate feature vectors.Residual Network is a deep convolution network with feedback loops proposed by He et al.,2015.Once it was created, it became the champion in three competitions in image classification, detection, and location in ImageNet.

Figure 4 .
Figure 4. Workflow of face feature vector calculation

Figure 5 .
Figure 5. Flowchart of the face recognition system

Figure 6 .
Figure 6.Flowchart for face detection, landmark estimation, and face feature vector calculation

Figure 7 .
Figure 7. Flowchart for the face recognition system

Figure 12 .
Figure 12.Training time of various algorithms

Table 1 . Comparative analysis of the existing face recognition schemes
Split data into a training set and a test set.5. Apply grid search and k-fold cross-validation on the training set to optimize hyper-parameters of the multiclass classifier.6. Train model using parameters obtained from step 5. 7. Obtain the model and test performance on the test set created in step 4. 8.For each image in the validation set.Using the trained model created in step 4 test performance on the validation set.End 8.1.Face detection and location.8.2.Facial landmarks estimation.8.3.128-D facial feature vector generation.8.4.Storing encodings and labels into a CSV file.9.
Table 3 contains the computation time of training and prediction.Figure 12 contains the information about the time taken by the various algorithms in milliseconds and the prediction time of various