Beyond Traditional Learning: Leveraging BERT for Enhanced Android Malware Detection

Beyond Traditional Learning: Leveraging BERT for Enhanced Android Malware Detection

Rebet Keith Jones
DOI: 10.4018/979-8-3693-3226-9.ch012
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This study explores the efficacy of the bidirectional encoder representations from transformers (BERT) model in the domain of Android malware detection, comparing its performance against traditional machine learning models such as convolutional neural networks (CNNs) and long short-term memory (LSTMs). Employing a comprehensive methodology, the research utilizes two significant datasets, the Drebin dataset and the CIC AndMal2017 dataset, known for their extensive collection of Android malware and benign applications. The models are evaluated based on accuracy, precision, recall, and F1 score. Additionally, the study addresses the challenge of concept drift in malware detection by incorporating active learning techniques to adapt to evolving malware patterns. The results indicate that BERT outperforms traditional models, demonstrating higher accuracy and adaptability, primarily due to its advanced natural language processing capabilities. This study contributes to the field of cybersecurity and NLP.
Chapter Preview
Top

Problem Statement

The realm of Android cybersecurity is increasingly besieged by the proliferation of malware, a trend that has escalated both in volume and complexity. This alarming rise is not merely a matter of quantity; the sophistication of these malicious entities has evolved, posing a significant hurdle for existing detection methodologies (Allix et al., 2016). The phenomenon is not static; the landscape of Android malware is in a state of continuous flux.

A particularly vexing challenge in this domain is the concept of drift in malware detection. Traditional machine learning models, once hailed as robust solutions, are now struggling to keep pace with the rapid evolution of malware (Kim et., al., 2022). This results in a marked degradation in their effectiveness over time, a trend empirically substantiated by the declining performance metrics of these models. For instance, a study by Yang et al. (2021) observed a reduction in the F1 score of a malware detection system from an initial 0.99 to a mere 0.76 within a six-month period, underscoring the urgency of addressing this issue.

Further complicating the matter is the increasing difficulty in differentiating between malware and benign applications. The evolving nature of benign applications, which often exhibit behavior similar to malware, has led to a rise in false positives. This presents a significant challenge to traditional detection methods, which are not equipped to adapt swiftly to these changes (Allix et al., 2015).

Complete Chapter List

Search this Book:
Reset