Vulnerability Assessment and Malware Analysis of Android Apps Using Machine Learning

Vulnerability Assessment and Malware Analysis of Android Apps Using Machine Learning

Pallavi Khatri, Animesh Kumar Agrawal, Aman Sharma, Navpreet Pannu, Sumitra Ranjan Sinha
DOI: 10.4018/978-1-7998-3299-7.ch015
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Mobile devices and their use are rapidly growing to the zenith in the market. Android devices are the most popular and handy when it comes to the mobile devices. With the rapid increase in the use of Android phones, more applications are available for users. Through these alluring multi-functional applications, cyber criminals are stealing personal information and tracking the activities of users. This chapter presents a two-way approach for finding malicious Android packages (APKs) by using different Android applications through static and dynamic analysis. Three cases are considered depending upon the severity level of APK, permission-based protection level, and dynamic analysis of APK for creating the dataset for further analysis. Subsequently, supervised machine learning techniques such as naive Bayes multinomial text, REPtree, voted perceptron, and SGD text are applied to the dataset to classify the selected APKs as malicious, benign, or suspicious.
Chapter Preview
Top

Introduction

In recent years Android operating system has become popular in mobiles and has overtaken the other platforms available in the market. There are over 2.5 billion active users in the market. Popularity of Android as an open source, easy to implement platform to the developers has led to development of varied apps that can be downloaded from Google play store and 3rd party stores. Cybercriminals are finding an easy gateway to a device through these permission based apps. With the increased use of various apps on Android based devices, the probability of malware attacks is increasing every second. Attacks done through the apps are more difficult to detect using conventional methods. Numerous methods have been proposed and various new techniques have been researched to analyze the malware so that the source of compromise can be identified. But the issue remains relevant because innovative methods are being used to pilfer a device. In order to handle the increased malware proliferation, various researchers have worked extensively on Machine Learning based malware analysis and have been successful to a great extent in classifying a given sample as malicious or benign. However, the malware development is fast outpacing the identification and the containment process, thereby giving impetus to more quality research in this field. While increased security mechanism has been put in place to filter out malicious content from being downloaded, it has not been able to prevent spread of malicious apps among android users. Some of these applications are gathering large amount of sensitive information from the user as well as from the android device without taking prior permission of the user. In order to install an application in an android device, there is a need to grant certain permissions by the user so that the app has the exclusively consent of the user to access its private or sensitive data. Every Android app after development is deployed after assembling and compiling in to an APK file. APK file includes the code of the application (“.dex” files), different dependencies (.jar) and resources, and the AndroidManifest.xml file. AndroidManifest.xml file is an important file that provides the information about the application features and configuration for the particular application. It also includes the information of the APIs regarding permissions, activities, services, contents. This file is important while doing malware analysis of a device. Android malware analysis can be done statically and dynamically. Static analysis detects the malicious behavior in the source code, data files, or the binary files without executing the application. However, this technique has a major drawback wherein it is impossible to detect any code obfuscation or dynamic code loading. Static analysis also supports signature based analysis using which signature of the malicious application (.APK) can be checked to maintain the integrity of a particular APK. Basic static analysis is straightforward and can be quick, but it’s largely ineffective against sophisticated malware and does not do behavior analysis of the APK. Dynamic analysis is a malware detection technique that evaluates the malicious behavior of an app by executing the application in a real environment. The main advantage of this technique is that it detects dynamic code loading and records the application behavior during runtime. Any network activity like connecting with CNC server or any outbound connection and any sensitive data pilferage is reflected during the dynamic analysis. Malware analysis normally uses static analysis as it gives fair idea about the malicious behavior of an app. However, static analysis is incomplete without dynamic analysis as dynamic analysis helps in corroborating the findings of static analysis and also gives vital evidence to a forensic investigator. This work uses a comprehensive approach for malware analysis wherein both static and dynamic analysis along with Machine Learning (ML) based techniques is used. An android device with multiple apps installed on it is used to check the behaviour of an app and identify if it’s depicting a malicious behaviour. Various analytical tools for static analysis are used to extract out the features from an app that includes Permissions and APIs and identify the maliciousness of the app. These features then form a training data 2 set which are used to train ML based algorithms. A very popular framework called WEKA (Waikato Enabled for Knowledge Analysis) is used to execute ML based algorithms for malware analysis of an app. Number of apps on play store is more than 2.1 million apps and is growing continuously. With these many apps it is practically not possible to analyse all the apps. The various apps can be classified into different broad categories like dating, antivirus, gaming, social media, etc. based on their functionality. Hence, the best approach is to analyse few apps from each of the popular categories and extract features from it so that they can be analyzed for its maliciousness. Here, the dating category has been chosen because of its popularity. The proposed solution utilizes this approach to apply four machine learning algorithms (Naïve Bayes multinomial text, REPtree, Voted Perceptron and SGD) and predict probability of it being malicious.

Complete Chapter List

Search this Book:
Reset