Hybrid Feature Selection Model for Detection of Android Malware and Family Classification

Hybrid Feature Selection Model for Detection of Android Malware and Family Classification

Sandeep Sharma, Prachi, Rita Chhikara, Kavita Khanna
DOI: 10.4018/978-1-6684-9317-5.ch012
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Android OS based applications offer services in various aspects of our daily lives such as banking, personal, professional, social, etc. Increased usage of Android applications makes them extremely vulnerable to various malware threats. A resilient and attack resistant machine learning based Android malware detector is desired to achieve a safe working environment. This work employs feature selection on static and dynamic features and proposes a hybrid feature selection method that can identify most informative features while eliminating the irrelevant ones. Information gain from filter and recursive feature elimination from wrapper feature selection methods outperform other evaluated feature selection techniques. Thereafter, different classification algorithms are trained on the features selected through hybrid feature selection technique and experimental results showed that XGBoost obtained maximum accuracy i.e., 98% and 89% for binary and multiclass classification respectively using only 50 features.
Chapter Preview
Top

1. Introduction

Android Operating System (OS) is one of the prominent mobile platforms that is dominating the smartphone industry worldwide for more than a decade. According to the report of Statista, Android operating system has captured nearly 71.8% market as compared to iOS which has captured around 27.6% in the last quarter of 2022 (Laricchia, 2023). The popularity of Android platform and its open-source nature encourages developers to design applications that can fulfil users day-to-day needs such as online shopping, instant messaging, banking, official meetings, and so on. These applications store vast amount of personal data for users including their password, credit/debit card information, images and messages, etc.

With the proliferation of Android applications, malicious Android applications are also increasing at an alarming rate. Cybercriminals use the open nature of Android to exploit the vulnerabilities present in the OS. They design malicious applications or inject malicious code into existing applications for stealing sensitive information, send premium SMS or remotely control/damage a machine (Faruki et al., 2015; Felt et al., 2011; Surendran et al., 2018). Nowadays, adversaries use advanced technological solutions such as automatic code generators, code obfuscators, etc to create more and more sophisticated Android malware. These malware evade signature-based or manual analysis based malware detection methods and often suffer from low detection accuracy, late detection, etc. Consequently, it is important to design an automated method of Android malware detection that can protect smartphone users and their information from these malicious attacks.

A number of industry experts and security researchers are conducting research to propose proficient malware detection techniques. The techniques for detecting malware are largely classified as static, dynamic, and hybrid analysis. Static analysis based detection methods reverse engineer the applications and analyze their source code to identify malicious components (Wagner & Dean, 2001). They offer high code coverage by exploring all execution paths within an application (Fraser & Arcuri, 2014) but fail to analyze encrypted, obfuscated and dynamically loaded code within applications. Dynamic analysis-based techniques execute the applications in an isolated environment to observe their runtime behavior (Kang & Srivastava, 2011). They can effectively deal with obfuscated as dynamically loaded code of applications (Fraser & Arcuri, 2014) but fail to cover complete code of an application. Consequently, hybrid analysis-based techniques have been proven more useful in malware detection (Zhang et al., 2011; Zhang et al., 2012). However, most of these existing methods use a dataset of applications that are collected over a short timeframe.

Therefore, it is extremely important to devise a new detection technique known as hybrid Android malware detection and classification technique which can perform on a wide range of Android applications evolved over a larger period of time. Further, malware writers frequently reuse the code of existing applications to write new but similar applications. Classification of malware into families helps malware analysts to segregate the already known malware with minimal effort. Furthermore, Android applications are composed of a large number of static and dynamic features. Some of the features are redundant in nature and don’t provide any significant information about detection and classification of malware into its respective family.

As a result, the proposed work presents an effective method that can detect as well as classify Android malware with high performance using a small number of informative features extracted from an extensive set of applications developed over 2008-2020 years.

The following contributions are made to the work presented:

  • Android malware detection and classification technique has been presented that uses hybrid features extracted from Android applications.

  • Proposes a hybrid (wrapper + filter) feature selection technique that selects the most effective features for identifying Android malware and classifying them into different families efficiently.

  • Selects the most significant 50 features to detect Android malware i.e, 22 static and 28 dynamic features.

Complete Chapter List

Search this Book:
Reset