Phishing Website Detection With Semantic Features Based on Machine Learning Classifiers: A Comparative Study

Phishing Website Detection With Semantic Features Based on Machine Learning Classifiers: A Comparative Study

Ammar Almomani, Mohammad Alauthman, Mohd Taib Shatnawi, Mohammed Alweshah, Ayat Alrosan, Waleed Alomoush, Brij B. Gupta, Brij B. Gupta, Brij B. Gupta
Copyright: © 2022 |Pages: 24
DOI: 10.4018/IJSWIS.297032
Article PDF Download
Open access articles are freely available for download


The phishing attack is one of the main cybersecurity threats in web phishing and spear phishing. Phishing websites continue to be a problem. One of the main contributions to our study was working and extracting the URL & Domain Identity feature, Abnormal Features, HTML and JavaScript Features, and Domain Features as semantic features to detect phishing websites, which makes the process of classification using those semantic features, more controllable and more effective. The current study used machine learning model algorithms to detect phishing websites, and comparisons were made. We have used 16 machine learning models adopted with 10 semantic features that represent the most effective features for the detection of phishing webpages extracted from two datasets. The GradientBoostingClassifier and RandomForestClassifier had the best accuracy based on the comparison results (i.e., about 97%). In contrast, GaussianNB and the stochastic gradient descent (SGD) classifier represent the lowest accuracy results; 84% and 81% respectively, in comparison with other classifiers.
Article Preview

1. Introduction

Phishing is an illegal tool used to identify information about customers’ identity and financial institution passwords. Social engineering techniques employ spoofed e-mails from lawful companies and agencies. Those emails are designed to enable users to reveal financial data, including usernames and passwords on fake websites. Computer subterfuge programs place offenders on servers to deliberately access data by using devices that retrieve usernames or passwords from online accounts. Corrupt local browsers misdirect customers to fake websites (or legitimate Internet sites). They use pipe-controlled proxies to track and capture keystrokes by consumers(Al-Momani et al., 2011; Ammar Almomani et al., 2013; Ammar Almomani, Obeidat, Alsaedi, Obaida, & Al-Betar, 2015; Ammar Almomani, Wan, Altaher, et al., 2012; Ammar ALmomani, Wan, Manasrah, et al., 2012; A Almomani et al., 2013; B. B. Gupta, Arachchilage, & Psannis, 2018; B. B. Gupta, Tewari, Jain, & Agrawal, 2017)

Recently,phishing detection based on Semantic Link Network (SLN) and semantic features, semantically organizing web resources, identify a phishing web page and its phishing target, become most popular techniques in recent years(R. M. Mohammad & AbuMansour, 2017; Verma & Hossain, 2013; Wenyin, Fang, Quan, Qiu, & Liu, 2010).A significant number of our everyday activities (e.g. activities on social networks, online banking activities and electronic business activities) have been receiving much attention. That is attributed to the growth of world networking and communication technologies. The free, transparent and unrestricted internet infrastructure creates an attractive environment for cyber-attacks and critical network vulnerabilities, including seasoned software users. Although the user’s knowledge and expertise are significant, users cannot completely stop the phishing scam (Al-Nawasrah, Almomani, Atawneh, & Alauthman, 2020; Alauthman, Almomani, Alweshah, Omoush, & Alieyan, 2019; A Almomani, Alauthman, Omar, & Firas, 2017)

Attackers often take into account the personality characteristics of the end-user to increase the effectiveness of phishing attacks. They consider these characteristics to trick the users who are relatively experienced(Alauthman et al., 2019). It should be noted that end-user-specific cyber-attacks cause massive losses in sensitive information and cash for individuals. Such loss is represented in billions of dollars each year (Alauthman, Aslam, Al-Kasassbeh, Khan, Al-Qerem, Choo, et al., 2020).

The metaphor used in the term (phishing attacks) is derived from 'fishing, fishing’ for targets. Investigators have received a lot of attention in recent years.Carrying out phishing attacks is enticing and tempting for hackers, who open some fake websites that are built just like the common and legal websites on the internet. Although these sites have identical visual user interfaces, there is a need for URLs that are different from the URLs of the original page. A patient and a knowledgeable client can easily detect most of these malicious sites through browsing the URLs.

Complete Article List

Search this Journal:
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing