Article Preview
TopIntroduction
With an estimated share of smart phone operating system of 70% to 85%, Android has become the most popular operating system for smart phone and other mobile devices. It is the fastest growing mobile operating system as in 2013 only 760 million devices running Android OS were sold to customers and in 2014 Android smart phone shipment reached to 1.24 billion. The number has increased by approximately 30% over the last year. As Android devices becoming popular rapidly, increasing number of security threats that target mobile devices has emerged. Now smart phones are susceptible to threats like stealing of user testimonial, activating malevolent services without user’s knowledge, denial of services etc. The Android operating system has become easy target for attackers, because the market share of Android has increased. Moreover, Android applications are easy targets for reverse engineering, which is an explicit characteristic of Java applications and which is often abused by malicious attackers, who attempt to implant malicious program into benign applications. Unlike other mobile operating system, Android maintains openness and doesn’t put much constraint on its users in downloading and uploading apps. Android leaves the security of device in user’s hand by allowing him/her to take the decision of whether to install an app or not. Unfortunately, due to lack of security knowledge user is not the right person to judge the intention of an application. In (VirusShare), it was shown that number of known malwares for Android has increased approximately 300% between 2012 to 2013 and was up to about 273,000 (Juniper Networks,2013; Trend Micro Incorporated, 2013). The main objectives for writing malwares range from amusement, spam to money-making data theft and payoff (Felt, Chin, Hanna, & Wagner, 2011). To shield mobile users from severe threats of Android malwares, different solutions have been proposed. Static analysis, mostly used by antivirus companies, is based on source code assessment by looking at apprehensive patterns. Although some static analysis approaches have been successful, different obfuscation techniques have evolved. Dynamic analysis is a method which involves running the apk in isolated environment in order to analyze its execution logs. But those techniques require more processing capacity and battery power. Every Android app requires a set of permission and these permissions are generally requested by any application during installation on mobile devices. Permission control therefore should be one of the major Android security mechanisms. But all the app developers are not responsible enough to keep the set of permission required to a minimum level and hence users are bound to grant some unnecessary permissions in order to install apps. Those unnecessary permissions of an over privileged app may be leaked to mal apps (Huang, Tsai & Hsu, 2012). On the other hand, the lack of knowledge about the risks associated with permissions, makes users confused about taking the decision whether to install the app or not (Sanz, Santos, Pedrero, Nieves & Bringas, 2013a). So, it is quite feasible to identify malware based on the permission sets they require during installation time. As a common technique for data mining, feature selection has been attracted much attention in recent times (Hassanien, Tolba & Azar, 2014; Lee & Lee, 2006). Permission vector of Android app may contain around 135 features. But huge data is extraordinary difficult because of the dimensionality as it may slowdown learning process and learning efficiency also may be degraded (Hu, Yu & Xie, 2006). As with any classification problem, classification is also comprised of two stages: feature reduction and a decision stage that actually performs the assignment on objects to classes based on the extracted features (Ripon, Kamal, Hossain & Dey, 2016). So, feature reduction techniques are highly required to reduce the dimensionally of data. The basic assumption of feature reduction is that there are redundant and unimportant attributes in datasets. Irrelevant or unimportant information should be removed by keeping the classification and decision-making ability. Unsupervised classification or community detection stands for the process of grouping data according to certain similarity measures from a graph. Community Detection is one of the major tools in social network analysis, like viral marketing, sharing of information, sentiments, emotions etc., but its implementation in terms of feature reduction in the context of malware detection is quite rare. In this paper, permission based static malware detection framework for Android operating system is proposed which involves dimensionality reduction and classification using machine learning algorithms. Summering, our main contributions are as follows: Feature similarity graph based on the similarities computed by Cosine similarity, Levenshtein distance, Manhattan distance and Euclidean distance have been produced and Community detection techniques such as Infomap, Louvain and VOS clustering techniques have been applied on those similarity graphs to select most prominent feature sets. Also, empirical validation using machine learning classifiers have been implemented and comparison of performances of different Weka based machine learning classifiers on different data sets as well as comparison of performance of our proposed community based methods with existing attribute selection methods have been implemented.