Article Preview
Top1. Introduction
Android, the Linux-based open-source mobile operating system is the largest used mobile OS in the world, it dominates the smartphone OS market with 73% share which makes it the most popular OS in the world, with over 2.5 billion active users. That success is due to the open-source nature of Android itself and for the large availability of smartphones that run it on the one hand, and on the other hand, the large number of apps and games freely available and easily accessible for users. Figure 1 shows the number of available applications in Google Play Store from December 2009 to March 2022.
Android applications are mainly available for download on the Google Play Store which is the official Google app store, and other manufacturer-specific app stores such as: Samsung, Huawei, Xiaomi. Android applications are also available on many unofficial and unsecure third-party websites in a form of APK files. Applications downloaded from these third-party websites could be very dangerous and might contain malware codes since they are not verified by Google or any other device manufacturer, thus, it is necessary to detect malware applications in order to protect user personal data and device integrity.
Figure 1.
Number of available applications in the Google Play Store from December 2009 to March 2022
The primary goal of mobile device malware is to gain access to user data stored locally on the device or on cloud as well as user information used in sensitive financial transactions in mobile banking apps. Mobile malware can be distributed in a variety of ways, including infected file attachments, shared files via Bluetooth and SMS phishing attacks. However, the primary malware distribution channel on mobile devices is currently app stores. According to a recent G DATA's Mobile Security Report (G DATA, 2022), the company's security experts counted more than 2.5 million malware apps for Android devices in 2021. As a result of these factors, Android malware is becoming increasingly problematic for both enterprise and individual users.
In order to deal with those dangerous attacks, researchers have proposed various methods and techniques to effectively detect malware apps on Android. Many of these methods use machine learning algorithms to classify Android apps into benign or harmful using popular classification algorithms. One of the most used techniques in literature is to use Android permissions as features to train and build one or multiple classification models, this type of techniques are known as permission-based methods.
In permission-based malware detection methods, Android permissions are extracted from application files and then used to generate feature vectors in order to use them later as input for different machine learning algorithms.
In this paper, we conduct a comparative analysis between various supervised machine learning techniques for Android malware detection. First, 5,000 malicious applications from different malware families and 5,000 benign Android applications from multiple categories were used to generate the dataset. Then, Android permissions were extracted from each application in the dataset and used to generate the feature vector. After that, six popular supervised machine learning algorithms were trained using generated features to build classification models (Decision Tree, Random Forest, Support Vector Machine, Naïve Bayes, K Nearest Neighbors and Multi-layer Perceptron). Finally, each classifier model was evaluated using the 10-fold cross validation method.
The rest of this paper is organized as follows. Section 2 explains briefly the related works. Section 3 describes the topics related to our study. In Section 4, the architecture of our proposed system for Android malware detection is presented. Section 5 presents the experimental settings and Section 6 presents and discusses the experimental results. Finally, the conclusion is presented in Section 7.
TopRecently, many techniques and methods have been proposed to detect Android malware applications using machine learning techniques. Traditional Android malware analysis approaches can be classified into three main categories, static, dynamic and hybrid analysis. In this section, we describe briefly the most relevant proposed approaches according to the analysis method they use.