Malware Analysis Using Classification and Clustering Algorithms

Malware Analysis Using Classification and Clustering Algorithms

Balaji K. M., Subbulakshmi T.
Copyright: © 2022 |Pages: 26
DOI: 10.4018/IJeC.290290
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Malware analysis and detection are important tasks to be accomplished as malware is getting more and more arduous at every instance. The threats and problems posed by the public around the globe are also rapidly increasing. Detection of zero-day attacks and polymorphic viruses is also a challenging task to be done. The increasing threats and problems lead to the need for detection techniques which lead to the well-known and the most common approach called as machine learning. The purpose of this survey is to formulate the most effective feature extraction and classification ways that sums up the most effective methods (which includes algorithms) with maximum accuracy and also to effectively understand the clustering properties of the malware datasets by considering appropriate algorithms. This work also provides an overview on information about malwares used. The experimental results of the proposed model clearly showed that the KNN classifier as the most accurate with 0.962355 accuracy.
Article Preview
Top

Introduction

The threat that malicious software is causing to the digital world is growing rapidly. As per AV-TEST, the aggregate number of new malware tests is expected to outperform 700 million by 2020 (AV-TEST, 2020). It is nearly impossible to control such massive amount of malwares. Therefore, networking and security researchers are using malware identification and detection systems to detect the malwares which initially includes two stages that is detection and analysis. This can be achieved through static or dynamic or integrated approach. The main goal of malware analysis is to record and capture the properties which can be additionally used to improve the security measures and make evasion of malware as difficult as possible. Figure 1 shows the different classification of malwares and these malwares can be present in any form or category such as a script, a segment of code or any other binary. The purpose of malware is to get the control of the system, derange the services of computer systems, take back the available functions, rob the restricted information and damage the sources.

Figure 1.

Classification of malware

IJeC.290290.f01

Illegal applications sometimes act as protective cover for the malwares. Trying to gain access to this illegalised software from many websites may download the malware itself. In general, this case is possible and found in cracked/pirated software. These malicious software are not only operatable, executable source codes but can act as supportive downloaders for malicious files like portable document formats (PDF) or other links. As per VirusTotal, 47.80% of malicious files are executables (More than 100M files with original information; more than 16M portable executables from distinct URLs; more than 20M files with rich telemetry data; more than 700,000 emails for rich contextual information). So, the intention here is to dissect these executables. Numerous malwares are available, and they can be categorized into Trojan pony, Virus, Worm, Adware, and Backdoor. Few of them cannot be arranged into a particular group, because malwares have various attributes which helps them to coordinate in various classifications and at some point, they are referred as generalized malicious files. Malware files are dissected on the methods of dynamic and static techniques.

Figure 2.

Records obtained from cybercrime magazine

IJeC.290290.f02

Figure 2 shows us the statistical records gathered from the cybercrime magazines (Morgan, 2019) on the ransomware attacks on business. It is estimated that the total damage costs shall exceed 20 Billion by 2021 and expected to attack a business every 11 seconds by the end of 2021.

The three basic analysis methods to analyse malware are as follows.

Static Analysis

Static analysis is a method in which the executable documents and files are tested for malware without executing it in an environment that is dynamically controlled. Executable files have numerous statistical features such as segments and memory minimization. The PE file format is a library in python which removes static highlights even in the presence of executable records.

Dynamic Analysis

Dynamic analysis is a method in which the malicious records and files are broken down under powerfully controlled domains (dynamically controlled systems). When the malicious code enacts, it modifies the index key of the host and corrupts the working framework in the Operating System (OS). Cuckoo sandbox or Noriben can be made use of to conduct the dynamic examination of malicious files. The fundamental point to be noted here is to utilize the sandbox to separate the original framework from the testing environment and concentrate the required data from malware execution. These sandboxes provide us the total information about the malware file execution. These documents contain numerous sections and each of them deals with unique information. Few features obtained from reports are Registry keys, Files, Summary, Internet Protocol (IP) addresses and Domain Name System (DNS) Queries (Ijaz et.al, 2019) and many other features.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 7 Issues (2023)
Volume 18: 6 Issues (2022): 3 Released, 3 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing