Article Preview
Top1. Introduction
When it comes to writers, there are a lot of possible opinions, the most common type of content on the internet is text data. The amount of data on the internet has recently exploded as a result of the advancement of wireless internet and mobile applications, with little regard for time or place. Google, Bing, and Yahoo have indexed over eight billion web pages and OSN Networking sites (Online Social Network). Such a wealth of data allows for the satisfaction of a vast range of user demands. Almost any piece of knowledge can now be found on the internet (Abedin, 2019). Many computer scientists are working to create more useful and reliable approaches for producing findings that are customized to the needs of consumers. This massive volume of data, on the other hand, will not only perplex people looking for what they need, but also pose a significant security risk. HTML is used in online documents to view information in the web browser, data lacking with semantic knowledge. As a consequence, when people scan the web for evidence, they must spend a fairly significant amount of time deciding if the web results they find are reliable (Aires et al., 2020)(Alfifi et al., 2018).This is a significant workload for humans, so computers that can do it automatically are needed. Furthermore, records contain personal information such as the author's identity, the users' organization index and formatted documentation(Ashcroft et al., 2015)(Rawat, Rajawat, Mahor et al, 2021)(Roggio, 2014).The Text documents is the most used document format for sending and exchanging documents at the moment. However, it contains a variety of privacy-related material that may lead to a data leakage issue (Baek et al., 2008)(Rawat, Mahor, Chirgaiya, & Rathore, 2021)(Rawat, Mahor, Chirgaiya, Shaw et al, 2021). Terrorists often use internet technology, such as email, post, likes, and comments, to exchange information for extending the organization networks. Using Terrorism-related web content, for example, it was revealed that the 'Hamburg Cell' was used for designing and targeting the September 11th attacks at USA (Becker et al., 2019). This demonstrates that by reviewing site results, it is possible to predict terrorist associated events and deter future terrorist attacks. However, it is not feasible to accurately identify terrorism-related behaviors by merely collecting keywords and meaning terms. Many researchers used mathematical approaches like Term Frequency (TF) or information bases like WordNet(Yu et al., 2008) to conduct their study (Choi & Kim, 2012). However, since human written language consists of more than just words, the accuracy rating depending on word frequency and information bases is inaccurate. When using an information-based strategy, the accuracy of the knowledge bases would also affect the outcome. Many experiments have been conducted using Bayes theory (Choi et al., 2014), decision trees (Davidson et al., 2017), Latent Semantic Analysis (LSA) (Dhariwal et al., 2011), Support Vector Machine (SVM) (Gambäck & Sikdar, 2017), and other methods to help computers understand human language. Document comprehension remains a difficult challenge for systems.
In the paper, the WordNet hierarchy to extract meaning terms for training terrorism-related posts is presented. The frequency of bigram data is then determined using background word sets. We utilized four different forms of articles categories namely: Terrorism, violence, healthcare, and Virus. The experiment used Keselj distances (Baek et al., 2008), to distinguish papers about terrorism with well-known n-gram-based similarity measures and Semantic Weight similarity.
The remainder of the paper is organized as follows: Section II shows about Literature Review; Section III outlines for Analysis of Text for Identification of Cyber Terrorism based post and Articles; Section IV represents Experiments; Finally Section V Concludes these papers with future work.