A Comparative Study of Machine Learning Techniques for Android Malware Detection

A Comparative Study of Machine Learning Techniques for Android Malware Detection

Mohamed Guendouz, Abdelmalek Amine
Copyright: © 2022 |Pages: 13
DOI: 10.4018/IJSI.309719
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The rapid growth and wide availability of Android applications in recent years has resulted in a spike in the number of sophisticated harmful applications targeting Android users. Because of the popularity and amount of open-sourced supported features of Android OS, cyber attackers prefer to target Android-based devices over other smartphones. Malicious programs endanger user privacy and device integrity. To address this issue, the authors investigated machine learning algorithms for detecting malware in Android in this study. They employed a static analysis approach, collecting permissions from each application's APK and then generating feature vectors based on the extracted permissions. Finally, they trained several machine learning algorithms to create classification models that can distinguish between benign and malicious applications. According to experimental findings, random forest and multi-layer perceptron approaches, which have accuracy levels of 95.4% and 95.1%, respectively, have the best classification performance.
Article Preview
Top

1. Introduction

Android, the Linux-based open-source mobile operating system is the largest used mobile OS in the world, it dominates the smartphone OS market with 73% share which makes it the most popular OS in the world, with over 2.5 billion active users. That success is due to the open-source nature of Android itself and for the large availability of smartphones that run it on the one hand, and on the other hand, the large number of apps and games freely available and easily accessible for users. Figure 1 shows the number of available applications in Google Play Store from December 2009 to March 2022.

Android applications are mainly available for download on the Google Play Store which is the official Google app store, and other manufacturer-specific app stores such as: Samsung, Huawei, Xiaomi. Android applications are also available on many unofficial and unsecure third-party websites in a form of APK files. Applications downloaded from these third-party websites could be very dangerous and might contain malware codes since they are not verified by Google or any other device manufacturer, thus, it is necessary to detect malware applications in order to protect user personal data and device integrity.

Figure 1.

Number of available applications in the Google Play Store from December 2009 to March 2022

IJSI.309719.f01

The primary goal of mobile device malware is to gain access to user data stored locally on the device or on cloud as well as user information used in sensitive financial transactions in mobile banking apps. Mobile malware can be distributed in a variety of ways, including infected file attachments, shared files via Bluetooth and SMS phishing attacks. However, the primary malware distribution channel on mobile devices is currently app stores. According to a recent G DATA's Mobile Security Report (G DATA, 2022), the company's security experts counted more than 2.5 million malware apps for Android devices in 2021. As a result of these factors, Android malware is becoming increasingly problematic for both enterprise and individual users.

In order to deal with those dangerous attacks, researchers have proposed various methods and techniques to effectively detect malware apps on Android. Many of these methods use machine learning algorithms to classify Android apps into benign or harmful using popular classification algorithms. One of the most used techniques in literature is to use Android permissions as features to train and build one or multiple classification models, this type of techniques are known as permission-based methods.

In permission-based malware detection methods, Android permissions are extracted from application files and then used to generate feature vectors in order to use them later as input for different machine learning algorithms.

In this paper, we conduct a comparative analysis between various supervised machine learning techniques for Android malware detection. First, 5,000 malicious applications from different malware families and 5,000 benign Android applications from multiple categories were used to generate the dataset. Then, Android permissions were extracted from each application in the dataset and used to generate the feature vector. After that, six popular supervised machine learning algorithms were trained using generated features to build classification models (Decision Tree, Random Forest, Support Vector Machine, Naïve Bayes, K Nearest Neighbors and Multi-layer Perceptron). Finally, each classifier model was evaluated using the 10-fold cross validation method.

The rest of this paper is organized as follows. Section 2 explains briefly the related works. Section 3 describes the topics related to our study. In Section 4, the architecture of our proposed system for Android malware detection is presented. Section 5 presents the experimental settings and Section 6 presents and discusses the experimental results. Finally, the conclusion is presented in Section 7.

Top

Recently, many techniques and methods have been proposed to detect Android malware applications using machine learning techniques. Traditional Android malware analysis approaches can be classified into three main categories, static, dynamic and hybrid analysis. In this section, we describe briefly the most relevant proposed approaches according to the analysis method they use.

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024)
Volume 11: 1 Issue (2023)
Volume 10: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 9: 4 Issues (2021)
Volume 8: 4 Issues (2020)
Volume 7: 4 Issues (2019)
Volume 6: 4 Issues (2018)
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing