An Abnormal External Link Detection Algorithm Based on Multi-Modal Fusion

An Abnormal External Link Detection Algorithm Based on Multi-Modal Fusion

Zhiqiang Wu
Copyright: © 2024 |Pages: 15
DOI: 10.4018/IJISP.337894
Article PDF Download
Open access articles are freely available for download

Abstract

Website link detection is an important means to ensure the security of the external chain. In the past, it was mainly realized through blacklisting and feature engineering-based machine learning, which has the problems of slow detection speed and weak model generalization ability. The development of neural networks has brought a new solution to the security detection of the external chain of the website. To address the performance bottleneck caused by the variable content length of web pages, this article introduces an innovative approach: a website external link security detection algorithm based on multi-modal fusion. It extracts text, dynamic script, and image features separately, and constructs a deep fusion model that combines these multi-modal features. Compared with the previous research results, the proposed method is superior to the traditional single-mode method, and can quickly and accurately identify malicious web pages. The accuracy and F1 value are improved by 2.7% and 0.026.
Article Preview
Top

Introduction

With the rapid development of information technology and the popularization of the Internet, the number of websites on the Internet has increased exponentially. In order to provide users with richer information resources and promote cooperation and interaction with other websites or institutions, a lot of external links are generally introduced into the website. Due to information updates, domain name changes, hacker attacks, and other reasons, if you link to an insecure external website, it will pose a security risk to users. Such risks can include malicious links, erotic gambling sites, or web pages containing malicious code that may lead to the disclosure of the user’s personal information, computer infection, economic losses, and other problems (Tenis & Santhosh, 2021). In addition, if you link to external websites containing harmful information, it will seriously damage the reputation of the organization, and users may doubt the professionalism, trust, and network security capabilities of the organization, which will affect user's access to and use of the organization's website. Therefore, ensuring the security of the external link of the website is crucial for the website.

It is an important means to carry out regular inspections of the external chain of the website to ensure the security of the external chain. However, due to the large number of websites and pages, it is undoubtedly unrealistic for website security managers to use manual inspection. With the development of computer technology, the research on the security detection of external links of websites by computer programs has been widely concerned, and many detection schemes have been proposed by scholars at home and abroad. The earliest detection method used the blacklist technique, which preconstructed a blacklist listing all known harmful domain names. When a user visits a website, they check whether its domain address is in the blacklist to detect harmful external links. This method has the advantage of high detection accuracy, but it needs to ensure the timely maintenance of the black and white list, which has certain limitations and lag and cannot effectively judge the security of unknown web pages (Darwish et al., 2023). To solve this problem, some scholars have proposed a method based on dynamic behavior analysis, which analyzes the behavior of the website host, such as access records, execution processes, etc., to analyze whether the website host has abnormal behavior and find out the abnormal external chain. This method has the ability to detect unknown viruses and malicious codes, but the detection speed is slow because it needs to simulate the running state of malicious web pages and analyze them.

With the development of data mining and machine learning technology, a website off-link security detection method based on machine learning has been proposed (Jerjes et al., 2023; Venugopal et al., 2021). This method has a certain generalization ability, but due to the great impact of the selection of webpage features on the model recognition effect, the workload in the feature engineering stage is relatively large. At the same time, the traditional machine learning technology cannot learn the contextual semantic features of web text, resulting in a certain bottleneck in the recognition effect.

In the past few years, the field of external chain detection has witnessed a shift toward deep learning-based approaches driven by the rapid advancements in machine learning and artificial intelligence technology. According to the existing literature, text features are mostly used, and due to the variable length of Chinese text on web pages (Naim et al., 2023), in order to achieve the feasibility of model training, in addition to short text features such as Uniform Resource Locator(URL) and tags, part of text content from web pages is generally extracted for model training, resulting in poor practicability of the trained model. In addition, with the development of communication technology, a large number of web pages contain not only text information but also a lot of multimedia information, such as pictures, videos, and sounds. It is not good to judge whether a web page has malicious information only through text information. In view of these problems, in this research, the website link security detection is regarded as a binary classification problem. By integrating the features of webpage text, dynamic script, and image, an innovative intelligent detection algorithm for website link security based on multimodal fusion is proposed. The main work of this paper includes:

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing