Analyzing Newspaper Articles for Text-Related Data for Finding Vulnerable Posts Over the Internet That Are Linked to Terrorist Activities

Analyzing Newspaper Articles for Text-Related Data for Finding Vulnerable Posts Over the Internet That Are Linked to Terrorist Activities

Romil Rawat, Vinod Mahor, Bhagwati Garg, Shrikant Telang, Kiran Pachlasiya, Anil Kumar, Surendra Kumar Shukla, Megha Kuliha
Copyright: © 2022 |Pages: 14
DOI: 10.4018/IJISP.285581
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

One of the most critical activities of revealing terrorism-related information is classifying online documents.The internet provides consumers with a variety of useful knowledge, and the volume of web material is increasingly growing. This makes finding potentially hazardous records incredibly difficult. To define the contents, merely extracting keywords from records is inadequate. Many methods have been studied so far to develop automatic document classification systems, they are mainly computational and knowledge-based approaches. due to the complexities of natural languages, these approaches do not provide sufficient results. To fix this shortcoming, we given approach of structure dependent on the WordNet hierarchy and the frequency of n-gram data that employs word similarity. Using four different queries terms from four different regions, this approach was checked for the NY Times articles that were sampled. Our suggested approach successfully removes background words and phrases from the document recognizes connected to terrorism texts, according to experimental findings.
Article Preview
Top

1. Introduction

When it comes to writers, there are a lot of possible opinions, the most common type of content on the internet is text data. The amount of data on the internet has recently exploded as a result of the advancement of wireless internet and mobile applications, with little regard for time or place. Google, Bing, and Yahoo have indexed over eight billion web pages and OSN Networking sites (Online Social Network). Such a wealth of data allows for the satisfaction of a vast range of user demands. Almost any piece of knowledge can now be found on the internet (Abedin, 2019). Many computer scientists are working to create more useful and reliable approaches for producing findings that are customized to the needs of consumers. This massive volume of data, on the other hand, will not only perplex people looking for what they need, but also pose a significant security risk. HTML is used in online documents to view information in the web browser, data lacking with semantic knowledge. As a consequence, when people scan the web for evidence, they must spend a fairly significant amount of time deciding if the web results they find are reliable (Aires et al., 2020)(Alfifi et al., 2018).This is a significant workload for humans, so computers that can do it automatically are needed. Furthermore, records contain personal information such as the author's identity, the users' organization index and formatted documentation(Ashcroft et al., 2015)(Rawat, Rajawat, Mahor et al, 2021)(Roggio, 2014).The Text documents is the most used document format for sending and exchanging documents at the moment. However, it contains a variety of privacy-related material that may lead to a data leakage issue (Baek et al., 2008)(Rawat, Mahor, Chirgaiya, & Rathore, 2021)(Rawat, Mahor, Chirgaiya, Shaw et al, 2021). Terrorists often use internet technology, such as email, post, likes, and comments, to exchange information for extending the organization networks. Using Terrorism-related web content, for example, it was revealed that the 'Hamburg Cell' was used for designing and targeting the September 11th attacks at USA (Becker et al., 2019). This demonstrates that by reviewing site results, it is possible to predict terrorist associated events and deter future terrorist attacks. However, it is not feasible to accurately identify terrorism-related behaviors by merely collecting keywords and meaning terms. Many researchers used mathematical approaches like Term Frequency (TF) or information bases like WordNet(Yu et al., 2008) to conduct their study (Choi & Kim, 2012). However, since human written language consists of more than just words, the accuracy rating depending on word frequency and information bases is inaccurate. When using an information-based strategy, the accuracy of the knowledge bases would also affect the outcome. Many experiments have been conducted using Bayes theory (Choi et al., 2014), decision trees (Davidson et al., 2017), Latent Semantic Analysis (LSA) (Dhariwal et al., 2011), Support Vector Machine (SVM) (Gambäck & Sikdar, 2017), and other methods to help computers understand human language. Document comprehension remains a difficult challenge for systems.

In the paper, the WordNet hierarchy to extract meaning terms for training terrorism-related posts is presented. The frequency of bigram data is then determined using background word sets. We utilized four different forms of articles categories namely: Terrorism, violence, healthcare, and Virus. The experiment used Keselj distances (Baek et al., 2008), to distinguish papers about terrorism with well-known n-gram-based similarity measures and Semantic Weight similarity.

The remainder of the paper is organized as follows: Section II shows about Literature Review; Section III outlines for Analysis of Text for Identification of Cyber Terrorism based post and Articles; Section IV represents Experiments; Finally Section V Concludes these papers with future work.

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing