Metamorphic malware detection using opcode frequency rate and decision tree

Metamorphic malware detection using opcode frequency rate and decision tree

Mahmood Fazlali (Department of Computer Science, Cyberspace Research Institute, Shahid Beheshti University, GC, Tehran, Iran), Peyman Khodamoradi (Department of Computer Engineering, Kermanshah Branch, Islamic Azad University, Kermanshah, Iran), Farhad Mardukhi (Department of Computer Engineering, Razi University, Kermanshah, Iran), Masoud Nosrati (Department of Computer Engineering, Kermanshah Branch, Islamic Azad University, Kermanshah, Iran) and Mohammad Mahdi Dehshibi (Pattern Research Center, Tehran, Iran)
Copyright: © 2016 |Pages: 20
DOI: 10.4018/IJISP.2016070105


Malware is defined as any type of malicious code that is the potent to harm a computer or a network. Modern malwares are accompanied with mutation characteristics, namely polymorphism and metamorphism. They let malwares to generate enormous number of variants. Rising number of metamorphic malwares entails hardship in analyzing them for signature extraction and database updates. In spite of the broad use of signature-based methods in the security products, they are not able detect the new unseen morphs of malware, and it is stemmed from changing the structure of malware as well as the signature in each infection. In this paper, a novel method is proposed in which the proportion of opcodes is used for detecting the new morphs. Decision trees are utilized for classification and detection of malware variants based on the rate of opcode frequencies. Three metrics for evaluating the proposed method are speed, efficiency and accuracy. It was observed in the course of experiments that speed and time complexity will not be challenging factors; because of the fast nature of extracting the frequencies of opcodes from source assembly file. Empirical validation reveals that the proposed method outperforms the entire commercial antivirus programs with a high level of efficiency and accuracy.
Article Preview

1. Introduction

The term “malware” is taken from “malicious” and “software” and refers to programs designed to damage or do other undesirable actions like unauthorized access to resources or data collecting (Vinod, Jaipur, Laxmi, & Gaur, 2009). Malwares include viruses, worms, Trojans, backdoors, bootkits, rootkits, botnets and other malicious programs (Griffin, Schneider, Hu, & Chiueh, 2009), (Wong & Stamp, 2006).

Enhancement of World Wide Web applications is coincided with increment of malware attacks. Overabundance of such these attacks made it a challenging issue for the security of systems. Malware developers are continually seeking for new ways to find vulnerabilities for infecting more systems. Even new directions of applications of malicious codes are emerging that aim at stealing personal information to commit fraud activities and cyber-crimes.

Malware developers are exerting to contend with security programs to form their illegal activities. It entails the security of computer systems to be highly relied upon keeping the antivirus programs updated (Egele, Scholte, Kirda, & Kruegel, 2012).

Amplification of malwares and new techniques for their proliferation caused more complexity in detecting such these programs through ordinary methods like static and signature based approaches. Some techniques like polymorphism and metamorphosis aim at complicating the detection process via reconstructing the malware programs, so that it does not defect their functions (Mathur & Hiranwal, 2013).

Metamorphic malware (Chouchane & Lakhotia, 2006) is one of the most threatening types of malwares that generates new code structures in each infection, while no defection happens in the functions of malicious code. Iterating the mutation causes difficulty in detecting the malware. Furthermore, the number of emerging viruses that utilize this technique is rising (Anderson, Quist, Neil, Storlie, & Lane, 2011). These malwares have some functions like: destruction of data, information theft and assuming ownership of computer resources. The attack of malwares to any ilk of computer systems like PC, mobile, cloud, etc. is imminent (Nosrati & Karimi, 2016). Also, money is another motivating factor for developing the malwares.

Signature based approach is a well-known malware detection category of methods that is widely used by antivirus developers (Aycock, 2006). Signature is a string of bits that specifically appears in the structure of a malware. Being aware of signature based approaches, malware programmers invented new techniques for circumventing the detection (Lin & Stamp, 2011), (Szor, 2005).

There are two categories for extant methods of metamorphic malware detection: dynamic analysis and static analysis approaches (Konstantinou & Wolthusen, 2008). Dynamic analysis lets the code to be executed in order to observe its behavior. Suspicious code may encompass an infected code part. An emerging problem is about the execution environment; so that running malicious code might diffuse the infection from current machine. On the other hand, executing the code on a specific machine has overheads (Bayer, Moser, Kruegel, & Kirda, 2006), (Sabaghi, Dashtbayazi, & Marjani, 2016), (Amid & Mesri Gundoshmian, 2015). If malicious code can be detected via features analysis or pattern recognition, then static analysis will be prior to dynamic approach. Also, if detecting the threatening code needs unusual execution circumstance, then, static analysis might lead to better performance. On the other side, static analysis techniques rely upon the source of the program; so they their performance is dependent to the availability of the source code (Al Daoud, Jebril, & Zaqaibeh, 2008).

So far, detection methods came across with different problems: they have acceptable performance against the known viruses; but the database of virus signatures should be updates continually; otherwise, their efficiency will be mitigated. Updating the database is a time consuming task (Al Daoud et al., 2008).

Complete Article List

Search this Journal:
Open Access Articles
Volume 13: 4 Issues (2019): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing