Rotational Invariance Using Gabor Convolution Neural Network and Color Space for Image Processing

Convolutional neural networks (CNNs) are deep learning methods that are utilized in image processing such as image classification and recognition. It has achieved excellent results in various sectors; however, it still lacks rotation invariant and spatial information. To establish whether two images are rotational versions of one other, one can rotate them exhaustively to see if they compare favorably at some angle. Due to the failure of current algorithms to rotate images and provide spatial information, the study proposes to transform color spaces and use the Gabor filter to address the issue. To gather spatial information, the HSV and CieLab color spaces are used, and Gabor is used to orient images at various orientation. The experiments show that HSV and CieLab color spaces and Gabor convolutional neural network (GCNN) improves image retrieval with an accuracy of 98.72% and 98.67% on the CIFAR-10 dataset.

insensitive to image rotation, and even minor rotations of an input image can significantly degrade their performance. Among the most frequently employed feature in image extraction is color, particularly for separating an image from the intricate natural background. This is because color provides strong, consistent visual features that are less reliant on the image's size. Color extraction is susceptible to issues like variable lighting conditions and occlusions; therefore, it might not be the most reliable technique. To solve these issues, in addition to the traditional RGB (Red, Green, Blue), alternative color spaces are employed to extract color features from the image to be recognized, including HSV (Hue, Saturation, Value), and L*a*b (CIELAB), among others. HSV is superior to other color spaces in that it can withstand variations in light intensity but lacks rotation invariant.
Developing invariant features for image rotation has gotten a lot of attention over the years. Researchers created a slew of rotation-invariant hand-crafted features for image retrieval, object recognition, textural image classification, and image patch matching, including Scale Invariant Feature Transform (SIFT) (David, 2004), Local Binary Patterns (LBP) (Ojala et al., 2002), Rotation-Invariant Feature Transform (RIFT) (Lazebnik et al., 2005), and so on. The visualization of first-layer weights in a CNN, for example, disclose that many of them are orientation copies of one another (Zeiler & Fergus, 2014). Furthermore, determining how many rotation versions should be generated for each training image is difficult. Thus, recent research has begun to focus on designing new network architectures to encode rotation invariance into classical CNNs and has proposed multiple modified models, such as Spatial Transformer Networks (STN) (Jaderberg et al., 2015), Transformation-Invariant Pooling (TI-Pooling) (Laptev et al., 2016), Rotation Equivariant Vector Field Networks (RotEqNet) (Marcos et al., 2017), and so on. Gabor DCNN filters have been the subject of numerous investigations due to their orientation nature. However, the Gabor filters are not explicitly incorporated into the filters, and thus do not reduce the impact of color channel overlapping as well as channel shifts. In particular, Krizhevsky et al. (2017) directly uses Gabor filters in generating Gabor features, using them as a Convolution Neural Network input. Alternatively, Luan et al. (2018), at most chooses the first and second Gabor-filter convolution layers, which are mainly designed for reducing the complication of the training of the Convolution Neural Network. Due to the failure of current algorithms to rotate images in an end-to-end CNN model and provide spatial information, the study proposes to transform color spaces and use the Gabor filter to address the issue. We examined the effects of various color space feature extraction methods and Gabor on classification in this study. Based on the findings, we recommend HSV and CieLab color spaces, as well as GCNN as a better model for rotating images at different angles while maintaing the image color. The proposed GCNN is significant because (1) HSV and CieLab color spaces are orthogonal in shape, representing a single value of a color tone and avoiding channel overlapping; (2) Make it simpler to identify specific colors without changing the image's overall brightness by representing colors in a more intuitive and perceptually meaningful way and (3) Create various independent images based on different orientations.

ReLATed woRK
We first review previous works done on Gabor filter-based techniques as well as color spaces.
Gabor separation has drawn a lot of interest among contemporary spatial data extraction algorithms because of its potential to provide discriminative and helpful highlights (Bovik et al., 1990). In comparison to other filtering algorithms, Gabor filtering indicates advantages in retrieving spatial data such as edges and surfaces (Randen & Husoy, 1999). Gabor channels are well-known for obtaining spatially confined data (Kumar & Pang, 2002). Gabor channel recurrence and direction depictions are expressly suitable for surface portrayal and segregation because they match the human visual framework. Gabor channels have the advantage of being consistent in their scale, revolution, and interpretation. Indeed, Yosinski et al. (2014) demonstrated that deep neural networks trained on images will generally learn first-layer highlights resembling Gabor channels. Furthermore, this validates our intuition in using pre-determined Gabor channels as weight portions in a Convolution Neural Network configuration. Several approaches based on Gabor Convolutional Networks have been used in this paper. Rimiru et al. (2022) develop a method for determining the significance of color space, scale, and orientation in image classification. The authors discovered the best Gabor parameters for feature extraction (Krizhevsky et al., 2017). Simply extract Gabor features as CNN model inputs (Yao, Chuyi, Dan, & Weiyu, 2016). To minimize memory storage, presented hybrid Gabor convolutional networks (HGCNs) that alternatively combine binarized feature maps and binarized filters into DCNNs. Yao and Song (2022) presented Gabor convolutional networks (GCNs) by replacing Gabor convolutional layers for regular spatial convolutional layers (GCLs). To classify features, convolution filters in each GCL are convoluted by Gabor filters with varied orientations and scales. However, because the convoluted filters are no longer rotational invariant with respect to distinct orientations, the final image representation following the MAX operator is not rotation invariant. Zhuang et al. (2020) recently presented transformation-invariant Gabor convolutional networks (TI-GCNs) by replacing GCLs for conventional convolutional layers. To achieve some invariance, the retrieved Gabor features are input into a weight-sharing convolution module, which is then go around with conversion element wise module. The vector scaler serves as an input to a fully connected layer with possibly dropout, and further disseminate to the output layer. Due of the weightsharing between parallel siamese layers, the actual model requires the same computational space. Nevertheless, using transformation pooling frequently in continuous GCLs can lead to the loss of some discriminative pixel values and a reduction in final classification performance.
A couple of studies relate Gabor filters to Deep Convolution Neural networks, though, they don't simply correlate Gabor channels into the convolution filters. In particular, Yao, Chuyi, Dan, and Weiyu (2016) uses Gabor channels to create Gabor characteristics, which are then glutted into a Convolution Neural Network and (Hinton et al., 2012) at most uses the Gabor channel in the first two layers of the convolution in order to reduce difficulties in training the Convolution Neural Network.
Also, several models that use HSV and CieLab color space has been used. Malik et al. (2018) proposed a Mature Tomato Fruit Detection Algorithm Based on improved HSV and Watershed Algorithm. Enhanced HSV conversion was used to separate the background and detect only red tomatoes. Suarez Baron et al. (2022) propose mask method by use of HSV color space to detach the leaf from the background, where the original image is first streamed to the HSV color space. Tamatjita and Sihite (2022) used HSV color space to extract ripe banana features. Chang and Mukai (2022) propose the use of CieLab to develop an algorithm to automatically mine assertive colors based on color features that are typically pondered by human viewers when examining color tones. (Manso et al., 2019) propose the segmentation and classification of foliar damages on coffee arabica leaves in the YCbCr color space. The method includes color segmentation on the HSV color space to separate the leaf from the background. Alqudah and Alqudah (2022) provide a system for categorizing colorectal cancer using several machine learning techniques. The technique makes use of characteristics that were recovered from 3D Gray Level Cooccurrence Matrix (GLCM) matrices of three distinct color spaces, namely RGB, HSV, and L*A*B. The findings demonstrate the excellent performance rate at which the suggested methodology can identify CRC. Combining texture information from all color space channels has resulted in this increased rate. Goel et al. (2022) investigated the importance of color space for anomaly identification in wireless capsule endoscopy images by combining color and texture. Real data from AIIMS Delhi was used in the analysis, which shows that HSV color space performs better than cutting-edge methods.

MATeRIALS ANd MeTHodS extracting Features
Most existing feature extraction techniques assume RGB color space as the optimal color space because it is the default color space. However, this assumption ignores the fact that HSV and CieLab color space are very crucial. This step is salient since the RGB color space specifies the image in primary colors, which is inefficient than the HSV color space when it comes to relating images. The HSV color space defines an image similarly to how the human eye identifies images based on balances like color, vibrancy, and brightness. The two-color spaces are picked out due to their single-value description of a color scheme. They enabled the color gamut to be primarily described along a single-color line with variations in tint and brightness, rather than a three-dimensional range using RGB color space, which could include individual unrelated colors. Following the image's conversion from RGB to HSV or CieLab color space, features are retrieved to create feature vectors. Finally, the feature vectors are used as input by the Gabor Convolutional neural network.

Gabor Convolutional Neural Network (GCNN) Model
For the incorporation of the manageable characteristics into the Gabor Convolution Networks, the information of the orientation was encrypted in the learning filter and embedded into various layers simultaneously. Gabor-orientation filters (GoF) is a manageable filter created using Gabor filter banks to manipulate the learned filters and then generate the improved functionality maps.

Gabor-Orientation Filters (GoFs)
Before the modulation of the Gabor filters, the Backpropagation process, which is known as learned filters, learns from the convolutional filters in standard CNNs. More details of the filter modulation are indicated in equation (1) and Figure 1.
The number of channels in a GoF is the number of Gabor guidelines U (direction) for convenience of implementation.
For the Vth scale, we define: Gabor Orientation Filter is defined as: Therefore, the ith GoF C i v represents U 3D filters (check Figure 2, here U = 4). For Gabor orientation Filters, the value of v decreases with decreasing layers, thus changing Gabor filter scales in GoFs according to layers.

GCN Convolution
In Gabor Convolution Network, GoF is used in the production of feature maps that precisely improve the scale and the orientation of deep features. The denotation of the output feature map (Fmap) in Gabor Convolution Network is as follows: Where C i represents vth Gabor orientation Filter and F is the input feature map as indicated in Figure 1. The filters of Fmap are acquired as shown below: In the scenario where there are ten Gabor orientation Filters with four Gabor orientations, the dimension of the output feature map is 10 4 30 30´´. Figure 2 indicates the forward convolution process of Gabor Convolutional Networks when the input feature map is enlarged to numerous channels (C in ¹ 1 ).

Updating GoF
The weight used in the forward calculation is Gabor Orientation Filters in GCNs, but only the learned filters are the weights that are kept. Consequently, the learned filerC i o , is the only one that requires an update in the backpropagation (BP) procedure. We have to summarize the sub-filter gradient in Gabor Orientation Filters to the respective learned filters as follows: Where L represents the loss function. As indicated in the above equation, the Backpropagation process is easily deployed from ORNs and deformable kernels, which normally needs a complex process. The GCN's model is only compact and efficient by updating the learned C i o , , filters, and also more robust changes in the scale and direction.
The used CIFAR-10 dataset to evaluate GCNN. CIFAR datasets comprised 60,000 of 32x32 color images in 10 categories and six thousand images in each category. Fifty thousand training images and ten thousand test images were provided. The CIFAR datasets included a broad range of classes with variations in the scale and orientation of the object. Figure 3 indicates the network layouts of GCNs, GCNN (U = 4), and TI-GCNs that were utilized in this experiment. Max-pooling and ReLU were adopted for every model after convolution layers, and a drop-out layer after the fully connected layer (FC) to prevent over-fitting. Max-pooling and ReLu are also used after convolutional layers for all network structures. Figure 4 shows extended convolution layer that is followed by rotational pooling layer. The steps of rotating convolution and pooling are shown in the top part of Figure 4. Each feature map is back-rotated in order to achieve rotational invariance, and filters are rotated by various orientation and applied as in a traditional convolution layer. A single-channel image is being affected by two distinct filters in the bottom section. The final output feature maps are the same, despite the fact that the input to the rotational pooling is arranged differently. • Created a database of normalized images 32x32 in size.

Algorithm 1: Gabor convolutional neural network
• Read the RGB-formatted Query input image.
• RGB images are converted to HSV and LAB color space formats.
• Gabor CNN, the first layer, is input image features from multiple color spaces.
• Initialized the hyper-parameters and network structure parameters(input and output feature map, filter size, channel(depth)) • Constructed Gabor filter (sinusoidal wave (γ and λ) • Set the number of Gabor orientations and scale using Equations (1 and 2) • Started training to produce Gabor Orientation filters using Equation (3) • Gabor Orientation filters and input feature maps were forwarded to Gabor Convolution Network to produce output feature maps using Equation (4) • Learned features were updated using the Back propagation process using the Regression loss function until the maximum epoch using Equation (5) • Using the GCNN model, query image features from the image database.
▪ To search for an image: ▪ While there is no convergence, For each image Do ▪ Determine the convolution features from the Gabor input image. ▪ Use the activation function (ReLU -f(x) = max (0, x)) to map them in non-linear space, ▪ To learn model parameters, continue forward pass and backpropagation. ▪ End for ▪ End while • From the image database, retrieve the similar images • Calculate and print the % of relevant images retrieved.
• Calculated Precision and Recall, and F1 score To compare the true positive and false positive rates for each color Space, describe accuracy and recall, precision, and f1 score were employed at different orientation. The results reveal the power of the GCNNs approach in terms of model efficiency when compared to the GCNs (Zhuang et al., 2020) and TI-GCNs (Yao, Chuyi, Dan, & Weiyu, 2016) algorithms. Tables 1-4 describe accuracy and recall, precision, and f1 score of different color space under different 0 0 , 45 0 , 90 0 and 135 0 orientation respectively.

dISCUSSIoN
In this study, we have presented a model for retrieving images. For preprocessing GCNN model is used, and the results are compared with state-of-art. The proposed model achieved an efficiency of 98.72% and 98.67% for HSV and CieLab respectively. The experiment also demonstrates that retrieval effectiveness increases with orientation. Most existing feature extraction techniques assume RGB color space as the optimal color space because it is the default color space. However, this assumption ignores the fact that HSV is very crucial. HSV and CieLab color spaces are roughly orthogonal in shape, representing a single value of a color tone while minimizing the influence of other channel changes and avoiding channel overlapping. Due to the advantage of chromaticity, the lines' hue and a* channel of the image is displayed more completely than they are in the RGB image. The pixel values in the channels are generally larger than those in the R, G, and B channels, resulting in a large difference in pixel values between channels. Furthermore, the average value of pixels in each channel can be used to demonstrate that the color gamut is wider than the RGB color gamut, and the larger difference in pixel values is very useful for extracting detailed features. The two-color spaces also make it simpler to modify specific colors without changing the image's overall brightness by representing colors in a more intuitive and perceptually meaningful way. Therefore, the necessity of using HSV and Cie L*a*b* color space and its advantages of color representation is proved.
The study also incorporated Gabor filter parameters into layers of DCNN. Most existing Gabor feature extraction techniques use the Gabor filter as the input layer of DCNN and the other layers continue using backpropagation to learn the model and hence consume a lot of time. In standard CNN, the set of weights is computed by the input feature map, output feature map, width, and length of convolutional kernels. The proposed model replaces the parameters with the direction and scale parameters of the Gabor filter. The number of directions is chosen to inherit the Gabor filter into the convolution layer and create a convolution layer based on the Gabor filter for a specified scale. The study demonstrates that different orientations have varied memory rates and precision. The recollection of more diverse images increases with increased orientation. Example when compared to 135 0 degrees, which has a precision and recall rate of 98.6, 0 0 degrees has 98.4 and 98.3 respectively. The study suggests using a parallel algorithm because computation time increases with orientation changes. More orientation more time for the model to converge In summary, the proposed model updates the learning weight during the training process. This strategy trains the system with a modest collection of filters to generate an improved deep-learning model. The fusion benefits from deep learning's high accuracy and Gabor filters' ability to extract critical characteristics more quickly. As a result, the suggested model outperforms the state-of-the-art in terms of learning ability, computing time, and accuracy.

CoNCLUSIoN
The objective of the study was to assess the use of texture and color approaches in image retrieval. The findings demonstrated that the Gabor convolutional neural network is capable of orienting images at various angles and retrieving new images with each angle shift. Also, to reduce the impact of other channel modifications and prevent channel overlaps, the image features that were derived using the HSV color space were used as input for the model. According to the findings, combining the Gabor filter with the HSV color space technique for image retrieval performed marginally better than combining the Gabor filter with the RGB or CieLab color spaces.