Feature Reduction and Optimization of Malware Detection System Using Ant Colony Optimization and Rough Sets

Feature Reduction and Optimization of Malware Detection System Using Ant Colony Optimization and Rough Sets

Ravi Kiran Varma Penmatsa, Akhila Kalidindi, S. Kumar Reddy Mallidi
Copyright: © 2020 |Pages: 20
DOI: 10.4018/IJISP.2020070106
(Individual Articles)
No Current Special Offers


Malware is a malicious program that can cause a security breach of a system. Malware detection and classification is one of the burning topics of research in information security. Executable files are the major source of input for static malware detection. Machine learning techniques are very efficient in behavioral-based malware detection and need a dataset of malware with different features. In windows, malware can be detected by analyzing the portable executable (PE) files. This work contributes to identifying the minimum feature set for malware detection employing a rough set dependent feature significance combined with Ant Colony Optimization (ACO) as the heuristic-search technique. A malware dataset named claMP with both integrated features and raw features was considered as the benchmark dataset for this work. The analytical results prove that 97.15% and 92.8% data size optimization has been achieved with a minimum loss of accuracy for claMP integrated and raw datasets, respectively.
Article Preview


For the past three decades, malware has been posing a continuous threat to networks and systems. Malware can be defined as software or malicious code injected into a target system or network to make the system work abnormally (Christodorescu et al., 2005). Virus, Trojans, backdoors, worms, rootkits, spyware, adware etc. are several forms of malware. In general, any malware is commonly termed as a virus, that was first framed by Cohen (1987). Each malware is designed with a common goal of destroying or committing some illegitimate access to the system to retrieve some sensitive information from the system. The type of malware and the anti-malware or malware detection systems depends on the hardware/software platforms and the operating system. The main goal of attackers is to infect or morph malware to evade from the malware detectors.

At present most of the systems are making use of signature-based methods in identifying malicious code. This technique uses a database that contains expressions or sequences that are considered as malware. Malware is detected and an alert is triggered if the signature of the code/program screened matches with that of the database. The major drawback of this technique is that the sequences need to be updated day to day and when a malicious code whose sequence is not already in the database enters into the system which is not detected, that leads to a major threat to the system. It was proved in the recent works that metamorphism and polymorphism are employed for code obfuscation to successfully evade detection of viruses and other malware (Christodorescu et al., 2005).

Malware identification is of three types namely, static analysis, real-time analysis, or a mixture of both. Concerning malware detection methods in static analysis, they are categorized as signature based, heuristic based and behavioral based. Figure 1 shows a taxonomy of malware detection techniques. Amro and Ali (2016) have described that signature-based technique is the most commonly used technique since it produces a very less error rate.

Figure 1.

A taxonomy of malware detection techniques


In this work, extracted features from the PE header of windows executable files are used. The PE header has four sections embedded within it (Liao, 2012). Figure 2 represents a snapshot of an executable file’s PE header when analyzed under a hex editor.

  • The DOS header

  • PE file header or The Common object file format (COFF) header

  • The optional header

  • The section header

Figure 2.

Snapshot of an executable file’s PE header in hex editor


The following features can be extracted from different headers in the PE header

DOS Header Features

Table 1 describes the features that can be identified with the help of the DOS header. The feature e_magic is a very basic feature that generally starts with the hex value 4D5A that means ‘MZ’ (Zatloukal & Znoj, 2017) at the beginning and indicates that the file is an MS-DOS executable file.

Table 1.
Features extracted from DOS header
e_magicMagic number.Numeric
e_cblpBytes on the last page of a fileNumeric
e_cpTotal pages a file contains.Numeric
e_cparhdrHeader size in paragraphsNumeric
E_maxallocMaximum number of extra paragraphs required.Numeric
E_spInitial sp valueNumeric
E_lfanewFile address of new exe headerNumeric
e_csumChecksum valueNumeric
e_minallocMinimum number of extra paragraphs requiredNumeric

Complete Article List

Search this Journal:
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing