Detection of Drive-by Download Attacks Using Machine Learning Approach

Detection of Drive-by Download Attacks Using Machine Learning Approach

Monther Aldwairi (Jordan University of Science and Technology, Department of Network Engineering and Security, Irbid, Jordan), Musaab Hasan (Zayed University, College of Technological Innovation, Abu Dhabi, U.A.E) and Zayed Balbahaith (Zayed University, College of Technological Innovation, Abu Dhabi, U.A.E)
Copyright: © 2017 |Pages: 13
DOI: 10.4018/IJISP.2017100102
OnDemand PDF Download:
No Current Special Offers


Drive-by download refers to attacks that automatically download malwares to user's computer without his knowledge or consent. This type of attack is accomplished by exploiting web browsers and plugins vulnerabilities. The damage may include data leakage leading to financial loss. Traditional antivirus and intrusion detection systems are not efficient against such attacks. Researchers proposed plenty of detection approaches mostly passive blacklisting. However, a few proposed dynamic classification techniques, which suffer from clear shortcomings. In this paper, we propose a novel approach to detect drive-by download infected web pages based on extracted features from their source code. We test 23 different machine learning classifiers using data set of 5435 webpages and based on the detection accuracy we selected the top five to build our detection model. The approach is expected to serve as a base for implementing and developing anti drive-by download programs. We develop a graphical user interface program to allow the end user to examine the URL before visiting the website. The Bagged Trees classifier exhibited the highest accuracy of 90.1% and reported 96.24% true positive and 26.07% false positive rate.
Article Preview


Everyday Internet users are a target by a large number of attackers who are constantly searching for vulnerabilities to perform various attacks with different motivations and intentions (Harley & Bureau, 2008). Narvaez, Endicott-Popovsky, Seifert, Aval and Frincke (2010) considered drive-by download attacks as one of the most important types of these attacks in which the attacker uses legitimate and illegitimate websites to spread malicious code. A file is downloaded to the user machine without trigger by exploiting a web browser vulnerability. The file usually contains a malicious code that runs on the target computer. This malware could be used to steal confidential data, create a backdoor or serve any imaginable malicious intent. Leit and Cova (2011) believe that drive-by downloads are involved in the spread of most of the recent malware infections.

Figure 1.

Drive-by downloads attack flow


Matsunaka, Urakawa, and Kubota (2013) found that the user is simply subjected to this attack by clicking a link in a phishing email, malicious hyperlink, or unwanted popup window. Figure 1 shows one possible scenario to launch a drive-by download attack. First, a malicious website is setup, called the landing or mothership website. This website could be mimicking a legitimate website or actual legitimate website where malicious code is injected. Once the websites are injected with the attack code, they act the first point in a chain of redirections to multiple intermediate websites. The point of the redirections is to hide the actual exploit servers and mislead investigators. The users are finally redirected to the exploit website, which includes a more elaborate malicious code charged with searching for vulnerabilities and flaws based on the version of the user’s web browser and operating system. Once vulnerability is located it will be exploited by the malware distribution website to download and install the desired malware directly to user’s device without his knowledge. All drive-by attacks need not to follow the exact same flow but the main idea remains valid.

The ultimate goal of drive-by download attack is to take control of the client’s system through exploiting the vulnerabilities of web browsers or its extensions forcing it to perform undesirable operations. Takata, Akiyama, Yagi, Hariu, and Goto (2015) found that attacks could result in data or financial loss because the attacker control over the victim’s computer. Most drive-by download attacks are carried out by following four main steps. Redirection is considered the first step in which the user is taken through a chain of redirection processes to deliver him to malware distribution site. Attackers use obfuscation as the second step to hide the malicious scripts under several layers of obscurity. The third step is environment preparation in which the attacker seeks getting the permissions to control the memory in order to inject the malicious code in the browser’s memory and jumps to execute the injected code. Exploitation is the last step to perform the attack and compromise the vulnerabilities in browser plugins. The compromised client then responds to remote commands from a Command and Control (C&C) run by some bot herder (Cova, Kruegel & Vigna, 2010).

Monitoring traffic and system activities provide a convenient way to identify attacks. Intrusion detection systems (IDS) are used for this purpose. Aldwairi (2006) used Hardware-based IDS to provide efficient and fast detection systems, however they are expensive, complex and are not easy to reconfigure. Aldwairi, and Alansari (2011) used software-based IDS with execlusion mechanisim to skip benign packsts. Despite the considerable speedup, it was not as fast as the hardware-based solutions. Researchers considered both the anomaly and the signature-based IDSs. Anomaly-based IDS systems provide a strong protection as they are able to identify new and previously unknown attacks. Aldwairi, Khamayseh, and Al‐Masri (2015) found anomaly-based IDSs suffer from high false positives and negatives. Signature-based IDS systems provide higher accuracy in identifying attacks, however they are limited to the predefined patterns and they cannot identify zero-day attacks that are previously unknown.

Complete Article List

Search this Journal:
Open Access Articles
Volume 16: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 15: 4 Issues (2021): 3 Released, 1 Forthcoming
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing