Community Based Feature Selection Method for Detection of Android Malware

Community Based Feature Selection Method for Detection of Android Malware

Abhishek Bhattacharya (Institute of Engineering & Management, Kolkata, India) and Radha Tamal Goswami (Birla Institute of Technology, Mesra, India)
Copyright: © 2018 |Pages: 24
DOI: 10.4018/JGIM.2018070105

Abstract

The amount of malware has been rising drastically as the Android operating system enabled smartphones and tablets are gaining popularity around the world in last couple of years. One of the popular methods of static detection techniques is permission/feature-based detection of malware through the AndroidManifest.xml file using machine learning classifiers. Ignoring important features or keeping irrelevant features may specifically cause mystification to classification algorithms. Therefore, to reduce classification time and improve accuracy, different feature reduction tools have been used in past literature. Community detection is one of the major tools in social network analysis but its implementation in the context of malware detection is quite rare. In this article, the authors introduce a community-based feature reduction technique for Android malware detection. The proposed method is evaluated on two datasets consisting of 3004 benign components and 1363 malware components. The proposed community-based feature reduction technique produces a classification accuracy of 98.20% and ROC value up to 0.989.
Article Preview
Top

Introduction

With an estimated share of smart phone operating system of 70% to 85%, Android has become the most popular operating system for smart phone and other mobile devices. It is the fastest growing mobile operating system as in 2013 only 760 million devices running Android OS were sold to customers and in 2014 Android smart phone shipment reached to 1.24 billion. The number has increased by approximately 30% over the last year. As Android devices becoming popular rapidly, increasing number of security threats that target mobile devices has emerged. Now smart phones are susceptible to threats like stealing of user testimonial, activating malevolent services without user’s knowledge, denial of services etc. The Android operating system has become easy target for attackers, because the market share of Android has increased. Moreover, Android applications are easy targets for reverse engineering, which is an explicit characteristic of Java applications and which is often abused by malicious attackers, who attempt to implant malicious program into benign applications. Unlike other mobile operating system, Android maintains openness and doesn’t put much constraint on its users in downloading and uploading apps. Android leaves the security of device in user’s hand by allowing him/her to take the decision of whether to install an app or not. Unfortunately, due to lack of security knowledge user is not the right person to judge the intention of an application. In (VirusShare), it was shown that number of known malwares for Android has increased approximately 300% between 2012 to 2013 and was up to about 273,000 (Juniper Networks,2013; Trend Micro Incorporated, 2013). The main objectives for writing malwares range from amusement, spam to money-making data theft and payoff (Felt, Chin, Hanna, & Wagner, 2011). To shield mobile users from severe threats of Android malwares, different solutions have been proposed. Static analysis, mostly used by antivirus companies, is based on source code assessment by looking at apprehensive patterns. Although some static analysis approaches have been successful, different obfuscation techniques have evolved. Dynamic analysis is a method which involves running the apk in isolated environment in order to analyze its execution logs. But those techniques require more processing capacity and battery power. Every Android app requires a set of permission and these permissions are generally requested by any application during installation on mobile devices. Permission control therefore should be one of the major Android security mechanisms. But all the app developers are not responsible enough to keep the set of permission required to a minimum level and hence users are bound to grant some unnecessary permissions in order to install apps. Those unnecessary permissions of an over privileged app may be leaked to mal apps (Huang, Tsai & Hsu, 2012). On the other hand, the lack of knowledge about the risks associated with permissions, makes users confused about taking the decision whether to install the app or not (Sanz, Santos, Pedrero, Nieves & Bringas, 2013a). So, it is quite feasible to identify malware based on the permission sets they require during installation time. As a common technique for data mining, feature selection has been attracted much attention in recent times (Hassanien, Tolba & Azar, 2014; Lee & Lee, 2006). Permission vector of Android app may contain around 135 features. But huge data is extraordinary difficult because of the dimensionality as it may slowdown learning process and learning efficiency also may be degraded (Hu, Yu & Xie, 2006). As with any classification problem, classification is also comprised of two stages: feature reduction and a decision stage that actually performs the assignment on objects to classes based on the extracted features (Ripon, Kamal, Hossain & Dey, 2016). So, feature reduction techniques are highly required to reduce the dimensionally of data. The basic assumption of feature reduction is that there are redundant and unimportant attributes in datasets. Irrelevant or unimportant information should be removed by keeping the classification and decision-making ability. Unsupervised classification or community detection stands for the process of grouping data according to certain similarity measures from a graph. Community Detection is one of the major tools in social network analysis, like viral marketing, sharing of information, sentiments, emotions etc., but its implementation in terms of feature reduction in the context of malware detection is quite rare. In this paper, permission based static malware detection framework for Android operating system is proposed which involves dimensionality reduction and classification using machine learning algorithms. Summering, our main contributions are as follows: Feature similarity graph based on the similarities computed by Cosine similarity, Levenshtein distance, Manhattan distance and Euclidean distance have been produced and Community detection techniques such as Infomap, Louvain and VOS clustering techniques have been applied on those similarity graphs to select most prominent feature sets. Also, empirical validation using machine learning classifiers have been implemented and comparison of performances of different Weka based machine learning classifiers on different data sets as well as comparison of performance of our proposed community based methods with existing attribute selection methods have been implemented.

Complete Article List

Search this Journal:
Reset
Open Access Articles
Volume 28: 4 Issues (2020): 2 Released, 2 Forthcoming
Volume 27: 4 Issues (2019)
Volume 26: 4 Issues (2018)
Volume 25: 4 Issues (2017)
Volume 24: 4 Issues (2016)
Volume 23: 4 Issues (2015)
Volume 22: 4 Issues (2014)
Volume 21: 4 Issues (2013)
Volume 20: 4 Issues (2012)
Volume 19: 4 Issues (2011)
Volume 18: 4 Issues (2010)
Volume 17: 4 Issues (2009)
Volume 16: 4 Issues (2008)
Volume 15: 4 Issues (2007)
Volume 14: 4 Issues (2006)
Volume 13: 4 Issues (2005)
Volume 12: 4 Issues (2004)
Volume 11: 4 Issues (2003)
Volume 10: 4 Issues (2002)
Volume 9: 4 Issues (2001)
Volume 8: 4 Issues (2000)
Volume 7: 4 Issues (1999)
Volume 6: 4 Issues (1998)
Volume 5: 4 Issues (1997)
Volume 4: 4 Issues (1996)
Volume 3: 4 Issues (1995)
Volume 2: 4 Issues (1994)
Volume 1: 4 Issues (1993)
View Complete Journal Contents Listing