Android-IoT Malware Classification and Detection Approach Using Deep URL Features Analysis

Currently, malware attacks pose a high risk to compromise the security of Android-IoT apps. These threats have the potential to steal critical information, causing economic, social, and financial harm. Because of their constant availability on the network, Android apps are easily attacked by URL-based traffic. In this paper, an Android malware classification and detection approach using deep and broad URL feature mining is proposed. This study entails the development of a novel traffic data preprocessing and transformation method that can detect malicious apps using network traffic analysis. The encrypted URL-based traffic is mined to decrypt the transmitted data. To extract the sequenced features, the N-gram analysis method is used, and afterward, the singular value decomposition (SVD) method is utilized to reduce the features while preserving the actual semantics. The latent features are extracted using the latent semantic analysis tool. Finally, CNN-LSTM, a multi-view deep learning approach, is designed for effective malware classification and detection.

discovered 357, 019,453, in 2017, 669,974,865, and in 2018, 246,002,762 new malware variants. Yet more malware variants are attempting to bypass anti-virus tools and avoid detection by several malware detection systems.
The rapid expansion of mobile interaction places a significant burden on smartphone security management. According to a recent study 1, the number of apps in the Google Play Store has increased from 16K in Dec. 2009 to over 2 million in Feb. 2016. As a result, mobile traffic has topped 3.7 exabytes. The growth of the mobile ecosystem is seriously compromised by malicious apps. There has been a massive increase in mobile malware, especially targeting Android devices. Devastating digital payment thefts and other attacks threaten mobile security. Despite the Android platforms and mobile antivirus security measures, sophisticated mobile malware continues to infiltrate mobile systems. The widespread use of mobile devices also exposes users to multiple risks. So we urgently need Android-based mobile malware detection systems (McLaughlin et al., 2017). A malware family is a group of malicious apps sharing code. Various malware samples use the malware families' codebase. All samples with the same interpretation are combined.
Faruki et al. (Faruki et al., 2014) explore the characteristics of a huge assortment of malware and categorizes existing mobile malware detection methods into static, dynamic, and traffic-based categories. Static analysis has been used in several previous studies to discover data leakage, malware, and security breaches in Android apps (Zhu et al., 2018). Nevertheless, static analysis of malware is challenging due to code polymorphism and obfuscation. These methods are used to produce malware variants to avoid detection. Numerous different dynamic analysis techniques strive to alter the device's operating system to monitor and access confidential information at runtime (Bader, Lichy, Hajaj, Dubin, & Dvir, 2022;Ucci, Aniello, & Baldoni, 2019). Such methods are helpful, but they necessitate a massive number of executions to encompass all app behavioral patterns (Ahmed, Lin, & Srivastava, 2021).

Motivation
Many virus detection techniques concentrate on the network traffic generated by mobile apps. Malware is identified by abnormal network behavior patterns. This type of malware detection system has the potential to be effective because the majority of Android malware performs its malicious functions via network traffic (Zhou & Jiang, 2012). The malware must communicate with a remote server over the Internet to carry out malicious tasks. These traces can be used to identify and track down specific malware. Furthermore, malware detection strategies based on network characteristics are more straightforward to design and implement than static or dynamic analytic approaches. For example, methods based on traffic detection can be installed at an access point or gateway. These methods rely solely on user-generated network traffic data, ensuring that users do not lose access to their mobile resources. Furthermore, these solutions do not necessitate any user actions aside from granting licenses to the detection service (W. Li, Bao, Zhang, & Li, 2022;S. Wang et al., 2020). The goal of network traffic-based approaches is to find distinguishing features that can be used to classify malware more effectively. Selecting efficient features, on the other hand, is a difficult task. We concentrate our investigation on malware samples that use the HTTP/HTTPS protocol to send data. Because HTTP accounts for 70% of the network traffic generated by Android apps, we chose it for our research. (Dai, Tongaonkar, Wang, Nucci, & Song, 2013). However, because HTTP traffic is generated in encrypted form, extracting useful information from it is extremely difficult.

Contributions
Previously, researchers used statistical features and network traffic content to deduce malware behavior, which can be tedious and inefficient. Therefore, we develop a method for detecting malware using malicious URLs (HTTP/HTTPS). It does not require a lengthy and complex feature selection procedure. To accomplish the automatic feature selection, we employ a novel features extraction process (Mikolov, Chen, Corrado, & Dean, 2013;S. Wang et al., 2017a). This method can automatically mine URL feature representations at numerous levels to tackle the problem of selecting features while preserving the high-level semantics. The main contributions of the paper are given as follows: 1. On the URLs in network traffic, we conduct feature extraction and selection using a text-based approach. Our research involves the development of a novel traffic data preprocessing and transformation method that is effective in the detection of malware using network flows. 2. The N-gram features analysis approach is used to mine the features' sequencing and then the Singular Value Decomposition (SVD) method is applied for features reduction preserving the actual semantics. The Latent Semantic Analysis (LSA) tool is employed to extract the latent features. 3. To extract deep and broad feature learning, we design a multi-view deep learning approach called Convolution Neural Network-Long Short Term Memory (CNN-LSTM). The novel approach is capable of producing discriminant features and thus mitigates the difficulty associated with feature selection in network traffic-based malware detection methods.
The remaining sections of the paper are as follows: Section 2 describes recent and related literature, Section 3 describes the proposed approach, Section 4 describes the results and discussions, and Section 5 concludes.

BACKGRoUND
Several studies (Chiramdasu, Srivastava, Bhattacharya, Reddy, & Gadekallu, 2021;Varma, Raj, & Raju, 2017) show that the Android platform employs a variety of security measures, including authorization mechanisms to protect infecting target devices. But users must be sufficiently aware of security issues to benefit from authorization-based protection. These constraints of over-reliance on the consumer make Android malware invade and spread through mobile devices. Most of these scanners look at things like authorizations and harmful programs to see if an app is malicious or not. These anti-virus scanners protect against harmful software. But malicious programs are always evolving and diversifying. So malware detection must be improved. There are now several malware detection tools that can dissect APK files without executing them. Sanz et al. (Sanz et al., 2013) proposed a static-based analysis that captures the app's uses-permission and uses-feature details for malware classification, with an accuracy of 86.41%. Shanshan et al. (S. Wang et al., 2017a) suggested using network traffic as semantic content to detect malware. A mobile app's HTTP flow is a text file. Natural Language Processing (NLP) can extract semantics from HTTP traffic data stored in text files. The next step is to detect malware using network traffic textual properties. The suggested approach obtains the classification performance with an accuracy rate of 99%. Aresu et al. (Aresu, Ariu, Ahmadi, Maiorca, & Giacinto, 2015), look at HTTP traffic when Android malware communicates with distant malicious servers. It also uses clustering to build malware family signatures. This signature is used to detect malicious attacks. Wang et al. (S. Wang et al., 2017b), proposed the TextDroid approach that divides the HTTP packet's content by special characters, and after that, generates n-gram sequence information to study the ordering in the collected features. Further, it collected sequence-based information fed into a machine-learning model for malware detection. However, the detection performance for this semantic-based approach is only 76.99%. Shyong et al. (Shyong, Jeng, & Chen, 2020), identified Android apps using both static and dynamic network monitoring. Dynamic analysis identifies malware families by analyzing network data. The static technique classified valid and harmful content at 98.86%. Also, the dynamic app family classification is 96% accurate.
Several deep learning-based malware classification studies have attained outstanding results (S. Wang et al., 2018;Xu, Eckert, & Zarras, 2021). Chen et al. (Chen, Yu, Zhang, Zhang, & Xu, 2016), proposed a convolutional neural network (CNN) method that uses HTTP headers to identify different 4 types of apps based on their generated traces. The use of CNN speeds up the process of selecting features and produces precise results in traffic identification. The given method achieved an average classification accuracy of 98%. David et al. (David & Netanyahu, 2015) presented a deep belief networks approach known as DeepSign. It can produce immutable compact interpretations of malware actions, possibly allowing it to distinguish nearly all current malware variants efficiently with an accuracy rate of 98.6%. Shanshan et al. (S. Wang et al., 2020), presented a malware detection system based on app HTTP visits. Then a multi-view neural network detects malicious acts with depth. This method can focus smooth attention on specific input variable traits. This method's accuracy rates are 98.81% and 89.3%, respectively. Brandon et al. (Laughlin, Sankaranarayanan, & El-Khatib, 2020) describe a service plate form for delivering contextual data obtained from network flow. It creates a set of contexts for network-based features such as host addresses, apps in use, and the time of the event. The system finds the nearest neighbors in each context, evaluates the feature proportions, and employs an ensemble of the unsupervised outlier detection algorithm. The proposed method has an accuracy of 85.56%.
We use deep features analysis to process HTTP network analysis because it is the most widely used protocol for mobile apps. HTTP-based encrypted data packets are collected and filtered using a features extraction process, and then the SVD mathematical model is used to extract the most significant features in a reduced dimensional space. CNN-LSTM, a multi-view deep learning strategy, is developed to obtain extensive feature learning. Figure 1 depicts the proposed method. The Android HTTP traffic is collected and fed into the malware detection system. Because the data is encrypted, a packet parser is required to decrypt it. Then, n-gram analysis is used to extract features with ordering details, and TFIDF is used to mine each feature's importance. The approach also aims to extract meaningful features in a reduced dimensional space. The reduced features are used in a CNN-LSTM model for malware classification and detection. Examples of malware families are shown in Figure 2. Each sample has its way of executing malicious Android apps. For instance, Adware is a type of computer program that displays ads on screen while using a web browser. Similarly, a botnet can be used to launch Distributed Denial of Service (DDoS) attacks, steal information, spear phishing, and allow hackers to obtain access to the device (Rieck, Holz, Willems, Düssel, & Laskov, 2008).

URL Collection
An app can be infected with malware after visiting a malicious URL. Malware frequently uses URL information to receive inputs and conduct suspicious attacks. So, detecting malware via URLs is useful. Android devices collect HTTP-based network data while online. The data is encrypted and may contain malicious or benign activities. Wireshark, a packet parser, decrypts network packets so that Android apps can explore network flows. It contains a lot of noisy data after decryption, which could affect malware classification accuracy. We use Wireshark's packet filtering command to mine the URLs because we need to focus on HTTP-based network traces. In addition, in the next step, we mined HTTP and HTTPS network traffic for features analysis (Narudin, Feizollah, Anuar, & Gani, 2016).

URL Preprocessing and Features extraction
The network flow is a fundamental unit in malicious traffic detection. However, many app-generated HTTP flows are mixed. So, we must extract each comprehensive flow from the complex traffic file and segment them.

N-gram Analysis
In NLP, an N-gram is a random string of words. An N-gram can be used as a sub-sequence for elements in a specific order with at least n elements. Flow segmentation provides an N-gram ordering for each term set. For example, the term set of HTTP flow in uni-gram form is "apikey airpush appid 60563 longitude 0" The flow of bigram can be "apikey-airpush, airpush-appid, appid-60563, 60563-longitude, and so on". The flow of trigram can be "apikey-airpush-appid, airpush-appid-60563, etc." The N-gram can be used to find important term connections in a network flow. "apikey" and "airpush" are 1-gram segments that not only define a term but also show that they are unrelated. The bi-gram sequence "apikey-airpush" is influenced by the term "apikey". The N value chosen must appropriately reflect the principle of term presence in the HTTP flow header. TFIDF (Karbab & Debbabi, 2019) is another method for determining the relative importance of different N-gram features. The local and global weights show the significance of each N-gram feature in a single or multiple network flow, respectively. Mathematically, TFIDF is given as in equation 1.
, . , Where t , f , d , and D denote term, frequency, single network flow, and multiple networks flow, respectively. These parameters can extract the local and global weights collectively. Equation 2 shows how many times each N-gram feature is referenced. tf The inverse document frequency of each feature's local weight is derived from equation 4 using the normalized term frequency form.
Where t , d , D , and N show terms, single network flow, multiple networks flow in a group, and all networks flow, respectively. The TFIDF works in two main steps, such as selecting the source code document and then indexing it to a specific address for extracting the TF*IDF value. The indexing process takes time O(n), several times the source code document chosen is O(|D|). Therefore, the total computational time of the TFIDF method is O(|D| n).

Features Reduction
We need enough records to effectively train a deep learning model. Less spatially relevant data requires semantic connections based on LSA. It uses a discrete representation space to retrieve semantic content from text documents. In Latent Semantic Indexing (LSI), latent constructs are used to find semantic-based concepts. It divides N-gram features into matrices that can classify malicious traffic (Hofmann, 2013;Landauer, McNamara, Dennis, & Kintsch, 2013). A dense N-gram feature is reduced using the SVD algorithm. To find patterns across networks, we need to use semantic factors that accurately explain network traffic. The LSA method can derive latent variables from network flows. The SVD algorithm converts N-gram features into three matrices (Landauer et al., 2013), i.e. M , U , Σ , V T as shown in equation 5.
Where M is m*n matrix, U is an m*n singular matrix, Σ is n*n diagonal matrix, V T is the transpose of the V matrix. N-gram features are fed into the SVD algorithm employing the Bag of Words (BoW) model as depicted in Figure 3. In the SVD algorithm, features are reduced from a large number to a small number of meaningful ones. Each N-gram feature is categorized as either malware, benign, or a member of a specific family of malware. By using the LSA method, the latent features can then be extracted from their original location in three-dimensional space.
The time complexity of LSA is given as follows. The LSA mainly works on three main matrices, such as M , U , Σ , V T . These matrices are already discussed above with detailed information. The computational time is the multiplication of these matrices, which is O(M U Σ V T ).

Deep Features Selection
Though, the LSA-based features are highly effective but we still need deep features which can decrease the training cost of the proposed deep learning model. The CNN network is used to examine the reduced features and further mine the deep features. There are several studies available (Lee, Saxe, & Harang, 2019;Vasan, Alazab, Wassan, Safaei, & Zheng, 2020), that used the CNN model to classify malware features. The CNN model works better with a variety of data, like text, images, and videos. As presented in Figure 4, the approach can be adopted of a one-dimensional CNN network with convolutional layers, pooling layers, dropout layers, and a fully connected layer. The LSA-based features are input sequences to the CNN network. Tensor flow library 1.9 is used to build the CNN network. By iteratively spinning through the semantic features, convolution filters out the best deep features. Each filter creates a new feature map. The number of filters is determined by hyperparameter optimization. We used two CNN layers with 64 and 128 filters each. The max-pooling layer reduces the feature space's size, range, and computation cost. This layer also generates a feature map of the most Where c k 1 is the parameter bias of the kth neuron in the first layer, t i l−1 is the result of the ith neuron in layer l-1, X ik l−1 is the kernel strength from the ith neuron in layer l-1 to the kth neurons in layer l, and ''f()" is the activation function. The CNN network is capable of extracting deep features, which aids in the development of an efficient classification model.

Multi-View Deep Learning
Usually, in the CNN model, the features extracted from pooling layers are forwarded to the fully connected layer. In the proposed approach, the deep features are forwarded to an LSTM layer, rather than a fully connected layer. The deep features are extracted from CNN, and further long-short-term correlations are detected by LSTM. To classify network traffic, the method employs a multi-view approach of the CNN-LSTM model, using the strengths of both models (Lyu & Liu, 2021;Wang, Yu, Lai, & Zhang, 2016). Figure 4 describes the combined approach of the CNN-LSTM model mainly in two steps.
Step 1 involves the use of convolution and max-pooling layers; while step 2 involves the use of the LSTM layer. The CNN layers encode the LSA-based latent features, and the LSTM layers decode the information in the feature set. LSTM relies on the concept of cell state and the numerous gates it incorporates. Knowledge about a cell's state is transmitted beneath the sequence alignment network via the cell state. There is a main memory component in the LSTM model, as well as input, forget, and output communication gates. The memory module is vital to the designed model. The memory cell stores the previous state in its memory. The sigmoid function processes both the recent hidden state and current input. The values are 0 to 1. If the value is close to 0, it means forget gate. The input gate specifies how much network data is stored in the unit state for each time "t". The forget gate then decides whether or not the associated records can pass to the input gate. Remember that the hidden state contains previous input data. Begin by adding the previous cell state and the received signal. The tanh function multiplies the nonlinear activation output to determine the hidden state's relevant information. To make accurate predictions, use the hidden state's result. 6 describes the model's behavior.
The input at time "t" is expressed by the symbol x t , the influence matrices are expressed by the symbols x * and w * , and the bias and hidden conditions are expressed by the symbols b and h. The activation functions are illustrated by the symbols σ and tanh. It is followed by the letters i t , f t , o t , and c t , which stand for input gate, forget gate, output gate, and memory cell, respectively. Collectively, these gates perform the processing of the designed model.

Dataset Preparation
The proposed method is thoroughly examined using two datasets obtained from the Canadian Institute for Cybersecurity 2. The first dataset, the Canadian Institute of Cybersecurity Android Adware and General Malware (CICAAGM2017) (Lashkari, Kadir, Gonzalez, Mbah, & Ghorbani, 2017), is collected semi-automatically through the installation of Android apps on legitimate mobile phones. A total of 1900 apps are used to create the dataset, which is divided into three categories: adware, general malware, and benign. A detailed description of each app is given in Table 1 and Table 2.

evaluation Measures
We set widely standard training and testing ratios, such as 80%, and 20%, respectively. We used five kinds of evaluation measures such as precision, recall, f-measure, accuracy, and confusion matrix. When the model correctly predicts the benign class, this is referred to as a true positive outcome. While a true negative is an outcome in which the model correctly predicts the malware class. One type of false positive is when a researcher incorrectly concludes that a hypothesis or finding is correct when it is not (also called a type I error). If something is falsely ruled out, this is known as a false negative (also called a type II error). The general classification performance is assessed utilizing an accuracy matrix. This is equal to the sum of correctly classified instances divided by the total number of instances. The evaluation matrices are given in equations 8 and 9.   Similarly, parts (i-l) present the accuracy and loss epoch curves for the CNN-GRU model. In part a, the training and testing curves start at 50% and gradually increase to 90%. After that, there is a sharp drop from 90% to 96.50%, followed by a more or less constant drop. Part e shows different behavior for CNN-RNN. For instance, the training and testing curves start at 40% and 91%. On the 32nd epoch, the training curve behaves at 80% while the testing curve behaves at 90%. Then it's pretty  much constant with a 94% accuracy. A similar range of behavior is seen in Part I for CNN-GRU. The movement of these curves differs slightly when using deep features with bi-gram. For instance, CNN-LSTM, CNN-RNN, and CNN-GRU exhibit behavior ranging from 50% to 95%. More bi-gram features mean more data for training and testing. It may be overburdening the training time with noisy data. Nonetheless, the proposed approach can handle massive amounts of sequenced data by reducing and selecting features. The loss curves for uni-gram and bi-gram for CNN-LSTM, CNN-RNN, and CNN-GRU, respectively, fall between (80%, 10%), (90%, 10%), and (40%, 15%), and (78%15%). The confusion matrices for the CIC-AAGM2017 dataset used to examine uni-gram and bi-gram for malware detection, are shown in Figure 6. The diagonal cells represent the correct classification values while the non-diagonal cells represent the incorrect classification values. As an example, in parts (a, b), CNN-LSTM has classification and miss-classification values of (91%, and 9%), respectively. Similarly, the benign has (99, 1%) and (97, 3%). Using CNN-RNN, the classification and miss-classification values for malware are (90%, 10%) and benign (98%, 2%) and (96%, 4%) respectively. Furthermore, using CNN-GRU, the classification and miss-classification values for malware is (90%, 10%), (91%, 9%) and benign are (97%, 3%) and (95%, 5%) respectively.  Figure 7 depicts the accuracy and loss curves for malware detection in the CIC-AAGM2017 dataset. The proposed approach detects malware more effectively than malware classification. The accuracy curves for CNN-LSTM, CNN-RNN, and CNN-GRU using uni-gram features are between (50%, 98%), (90%, 97%), and (40%, 96%), respectively. The testing curve for CNN-LSTM drops to 58% on the 10th epoch, while the training curve drops to 78%, indicating possible data loss (part c). As shown in part g, when CNN-RNN is used on the 5th and 25th epochs, the training, and test curves drop, indicating data loss. The accuracy curves for bi-gram features, CNN-LSTM, CNN-RNN, and CNN-GRU, are between (50%, 100%), (86%, and 98%). (50%, 99%), respectively. The testing and training curves in CNN-RNN parts (f, h) behave very differently, indicating data loss. These unusual behaviors can affect malware detection accuracy. On the CIC-AAGM2017 dataset, with uni-gram and bi-gram deep features analysis, confusion matrices for malware classifcation are shown in Figure 8. When CNN-LSTM is used in parts (a, b), the adware has classification and miss-classification values of (99%, 1%) and (100%, 0%) respectively. Similarly, general malware has (98%, 2%) and (98%, 2%) respectively. The classification and miss-classification values for adware are (98%, 2%) and general malware is (98%, 2%) using CNN-RNN in parts (c, d). The classification and miss-classification values for adware are (98%) (2%), (99%,1%), general malware (98%, 2%), (97%, 3%). The CNN-LSTM approach with bi-gram features has better malware detection accuracy. The confusion matrices for malware detection using the CICMalDroid 2020 dataset are shown in Figure 9. The malware has classification and miss-classification values of (99, 1%), (96%, 4%), respectively, when CNN-LSTM is used in parts (a, b). Similarly, the benign has (99.9%, 1%), (99.9%, 1%), and (99.9%, 1%), respectively. Following that, using CNN-RNN in parts (c, d), the classification and miss-classification values for malware are (96%, 4%), (97%, 3%), and benign are (100%, 0%), (86%, 4%), respectively. Furthermore, using CNN-GRU in parts (e, f), the classification and missclassification values for malware are (98%, 2%), (94%, 6%), and benign are (90%, 10%), (96%, 4%), respectively. The confusion matrices for malware classification using the CICMalDroid 2020 dataset with uni-gram and bi-gram deep features analysis are shown in Figure 10. Using this dataset, malware  parts (a, b), the adware family has classification and miss-classification values of (100%, 0%), (100%, 0%), respectively. The banking family has (97%, 3%), (100%, 0%, and (100%, 0%), respectively. The riskware family has (97%, 3%), (95%, 5%), and (95%, 5%), respectively. The SMS family has (88%, 12%), (93%, 1%), and (93%, 1%), respectively. Using CNN-RNN in parts (a, b), the adware family has classification and miss-classification values of (100%, 0%), (100%, 0%), respectively. The banking family has (95%, 5%), (96%, 4%), and (96%, 4%), respectively. As can be seen, the proposed approach outperforms CIC-AAGM2017 in terms of classification results for CICMalDroid 2020.  Table 3 compares state-of-the-art malware classification methods using the CIC-AAGM2017 dataset. To analyze both uni-gram and bi-gram features, evaluation matrices such as precision, recall, f1-score, and classification accuracy are used. It can be seen that uni-gram features provide better classification accuracy for CNN-LSTM, i.e. 94.75%, than bi-gram features, i.e. 94%. Similarly, the CNN-RNN and CNN-GRU perform well with uni-gram, 94.12%, and 93.58%, respectively, as compared to bi-gram, 93.05%, and 93.27%. When compared to uni-gram analysis, bi-gram analysis generates roughly twice as many features. However, the classification accuracies of the uni-gram and bi-gram remain slightly different. It shows that our method can classify a large number of malware features with better classification results. On the other hand, malware detection results with bi-gram features outperform those with uni-gram features. For example, Table 4 compares state-of-the-art malware detection methods using the CIC-AAGM2017 dataset. The proposed approach is analyzed using evaluation matrices such as precision, recall, f1-score, and classification accuracy. The proposed method is tested on the CICMalDroid 2020 dataset using various evaluation matrices such as precision, recall, f1-score, and classification accuracy. Table 5 compares malware classification performance with state-of-the-art methods using uni-gram and bi-gram features. CNN-LSTM, CNN-RNN, and CNN-GRU classification accuracies with uni-gram are 98.8%, 97.93%, and 93.57%, respectively. Table 6 compares malware detection performance with the CICMalDroid 2020 dataset to other state-of-the-art methods. We have four types of malware: adware, banking malware, riskware malware, and SMS malware. The CNN-LSTM, CNN-RNN, and CNN-GRU have malware detection accuracies of 95.35%, 93.69%, and 92.73%, respectively, thanks to uni-gram deep features. The CNN-LSTM, CNN-RNN, and CNN-GRU have malware detection accuracies of 97.09%, 96.62%, and 95.1%, respectively, with bi-gram deep features. Bi-gram deep features are effective for malware detection, while uni-gram deep features are effective for malware classification. Table 7 shows the comparisons of the proposed approach with previously published works. These works mainly used HTTP network flows to classify the Android apps as malware or benign. Multi-view neural networks are used to develop a malware analysis approach that provides depth and breadth of the features. The HTTP-based malware classification accuracy is 98.81%. Our proposed approach outperforms with a malware detection accuracy of 99.46%.

T-SNe Features Visualization for Performance Validation
The t-SNE visualization method is intended to determine whether the features contain substantial or sparse information. Moreover, the t-SNE algorithm is designed to validate the effectiveness of the proposed approach. Maaten et al. (Van der Maaten & Hinton, 2008) developed a t-SNE algorithm for visualizing high-dimensional data. Figure 11 shows the balance of local and global semantic feature weights for various perplexity values. Two t-SNE visualization tests were created in R. We try to find the minimum level of perplexity required to distinguish malware from benign clusters. In the second test, the best malware and benign clusters are separated by perplexity. For instance, parts (a, c, e, g) show the lowest perplexity values and parts (b, d, f, e) show the optimal perplexity values. t-SNE uses iterations to distinguish between samples. To show the different malware and benign   clusters, we used 300 iterations for each perplexity. Semantic feature plots on 100,110,120,130 perplexity values are shown in the first column. An orange cluster indicates benign samples, while a light blue cluster indicates malware. Part a has no discernible cluster with overlapped data when perplexity is 100. On 110, 120, and 130 perplexity levels, the first t-SNE experiment shows the separation of malware and benign samples into visual clusters. As a result, the second experiment is run to distinguish malware from benign clusters. For the second t-SNE test, we chose four ideal perplexity values: 150, 155, 160, and 160. The second column shows how a low-dimensional semantic property can be used to group data into groups. The density of the dataset influences overall classification performance. Higher density predicts better because more descriptive information is available for training. Optimal perplexity values help isolate t-SNE visual clusters for better classification. Using acceptable perplexity values, a dataset can be separated and then classified using hyperparameters. This method is used to test the proposed approach's effectiveness because semantic features can easily be separated and classified as malicious or benign.

Network Flow
Malware that uses protocols other than HTTP cannot be detected using the proposed method because it only looks at HTTP in the network traces. We chose HTTP for our study because it accounts for 70% of the network traffic generated by Android apps. We can simply determine the general network behavior and whether the specified activity is malicious or not by analyzing HTTP-based traffic. According to the findings of the experiments, our technique detects the vast majority of legitimate malware attacks. The proposed method can also expose suspicious network behavior of apps, which is not achievable with other methods.

Encrypted Traffic
Malware is increasingly choosing to communicate with networks via encrypted traffic to avoid detection. Nevertheless, the issue of encrypted malicious traffic has been examined by several studies. To deal with encrypted traffic, we devised a method based on Wireshark commands that can decrypt the encrypted network traffic into readable form. As a result, the proposed method is also capable of detecting malware samples in HTTPS traffic.

Sample Size
Our sample size may not be sufficient to represent the entire ecology of malware variants. However, we examined two large datasets, CIC-AAGM2017 and CICMalDroid 2020, which contain a total of 10.2k malware and 3.2K benign samples, which is significantly larger than several other related aspects. Because we are using two large standard datasets without any alteration, we can confidently state that the proposed approach can operate on other systems as well.

Evaluation Measures
It's even arguable that the evaluation metrics we chose aren't appropriate. To mitigate these issues, we developed prominent research metrics, such as precision, recall, f-measure, and classification accuracy. To prove the efficacy of the proposed method, we prepared a detailed evaluation that compared CNN-LSTM to CNN-RNN and CNN-GRU. Additionally, the N-gram-based LSA features and CNN-based deep features are very effective for accurate malware classification.

Theoretical and Practical Contributions
The objective of network-based strategies is to find unique features that can be used to accurately classify malicious traffic. Multiple malicious scripts may be employed by network malware to harm mobile apps. Riskware is capable of incorporating malicious bytes required for remote session hijacking. For example, network traffic may contain "application/x-javascript" coding, which the remote device will execute to prevent regular access. Similarly, adware is a type of malware that hides on the target machine and displays ads in the form of malicious scripts. Some adware monitors online activity to deliver related ads. Moreover, unresolved IP addresses may be directly connected to some malicious software. This is a common indicator of malicious intent. It is possible to identify potentially harmful scripts in terms of behavioral segmentation through the use of network traffic. It can be time-consuming and ineffective for experts to infer malware behavior from statistical features of network traffic. We use a novel feature extraction process to achieve automatic feature selection. This approach addresses the issue of feature selection while preserving high-level semantics by automatically mining URL feature representations. We used deep features analysis of the HTTP protocol because it is the most common protocol. HTTP-based encrypted data packets are collected and filtered, and afterward, the SVD method is used to extract the most notable features. The CNN-LSTM model is designed to classify network-based malware.

CoNCLUSIoN
Because a large portion of Android malware uses network traffic to carry out its malicious functions, this type of malware detection strategy has the potential to be effective. A deep features analysis approach for malware classification and detection is developed in this study. The encrypted URLbased network traffic is collected and mined first before being decrypted for further analysis. Second, using segmentation and N-gram approaches, features with ordering details are extracted. The N-gram analysis greatly expands the number of features that can result in noisy data. Third, the SVD method is used to transform the features in reduced dimensional space. The LSA tool is then used to capture the semantics. These are latent features for reduced N-gram features with semantics The CNN network is used to extract deep features from latent variables to reduce computational and training costs.
The HTTP protocol analysis may not be sufficient to deal with all forms of malicious behavior. To extract data features from various types of packets, we must investigate all commonly used communication protocols. We believe that in the future, we will investigate malware network behavior using other protocols such as Transmission Control Protocol (TCP), User Datagram Protocol (UDP), and Domain Name Server (DNS). Furthermore, we will employ transfer learning by utilizing a pretrained model.

CoNFLICT oF INTeReST
The authors of this publication declare there is no conflict of interest.

FUNDING AGeNCy
This research received no specific grant from any funding agency in the public, commercial, or notfor-profit sectors.