Article Preview
TopIntroduction
Phishing is an online scam in which cybercriminals pose as trusted actors to trick their victims into sharing sensitive data or installing malware. This can potentially include personal information. Phishing is considered a “social engineering” attack because its success relies on human error rather than the victims' hardware or software (Gabrailova, 2021). According to the latest reports from the anti-phishing working group [APWG] (2021), phishing has evolved significantly today, causing severe economic losses worldwide. Phishing sites are also proliferating in quantity and complexity. Figure 1 represents the number of phishing sites and the number of emails from august 2019 to March 2021. This number represents only phishing sites and emails detected by APWG (2021) and its partners.
Figure 1. Phishing activity from 3rd quarter of 2019 to 2nd quarter of 2021 by APWG (2021)
Cybercriminals use various techniques that sometimes cannot be detected by existing anti-phishing mechanisms. This includes URL shortening, use of subdomains, link manipulation techniques, and use of URL redirects (Communications Security Establishment, 2020), all based on social engineering to deceive users. According to Liu et al., (2021), an increasing number of features are extracted by anti-phishing methods, but the reason for extracting these features is not clear. The existing features do not sufficiently reflect the nature of phishing, which steals sensitive information through spoofing. This leads to a result in which the features are only valid in a few limited and specific scenarios, such as for specified datasets or a browser plug-in. Besides, researchers prefer to use a reduced combination of approaches to increase the processing time of their system (Lee et al., (2015), Jain and Gupta., (2016), Choon et al., (2017), Dalgic et al., (2018), Shirazi et al., (2018), Orunsolu et al., (2019), Halgaš et al., (2020), Azeez et al., (2020), and Opara et al., (2020).
In the present work, we propose a new design and implementation that include new features for detecting and analyzing phishing on social media, especially on Twitter. The following points distinguish our contribution: