Reference Hub10
Internet Data Analysis Methodology for Cyberterrorism Vocabulary Detection, Combining Techniques of Big Data Analytics, NLP and Semantic Web

Internet Data Analysis Methodology for Cyberterrorism Vocabulary Detection, Combining Techniques of Big Data Analytics, NLP and Semantic Web

Iván Castillo-Zúñiga, Francisco Javier Luna-Rosas, Laura C. Rodríguez-Martínez, Jaime Muñoz-Arteaga, Jaime Iván López-Veyna, Mario A. Rodríguez-Díaz
Copyright: © 2020 |Volume: 16 |Issue: 1 |Pages: 18
ISSN: 1552-6283|EISSN: 1552-6291|EISBN13: 9781799805236|DOI: 10.4018/IJSWIS.2020010104
Cite Article Cite Article

MLA

Castillo-Zúñiga, Iván, et al. "Internet Data Analysis Methodology for Cyberterrorism Vocabulary Detection, Combining Techniques of Big Data Analytics, NLP and Semantic Web." IJSWIS vol.16, no.1 2020: pp.69-86. http://doi.org/10.4018/IJSWIS.2020010104

APA

Castillo-Zúñiga, I., Luna-Rosas, F. J., Rodríguez-Martínez, L. C., Muñoz-Arteaga, J., López-Veyna, J. I., & Rodríguez-Díaz, M. A. (2020). Internet Data Analysis Methodology for Cyberterrorism Vocabulary Detection, Combining Techniques of Big Data Analytics, NLP and Semantic Web. International Journal on Semantic Web and Information Systems (IJSWIS), 16(1), 69-86. http://doi.org/10.4018/IJSWIS.2020010104

Chicago

Castillo-Zúñiga, Iván, et al. "Internet Data Analysis Methodology for Cyberterrorism Vocabulary Detection, Combining Techniques of Big Data Analytics, NLP and Semantic Web," International Journal on Semantic Web and Information Systems (IJSWIS) 16, no.1: 69-86. http://doi.org/10.4018/IJSWIS.2020010104

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

This article presents a methodology for the analysis of data on the Internet, combining techniques of Big Data analytics, NLP and semantic web in order to find knowledge about large amounts of information on the web. To test the effectiveness of the proposed method, webpages about cyberterrorism were analyzed as a case study. The procedure implemented a genetic strategy in parallel, which integrates (Crawler to locate and download information from the web; to retrieve the vocabulary, using techniques of NLP (tokenization, stop word, TF, TFIDF), methods of stemming and synonyms). For the pursuit of knowledge was built a dataset through the description of a linguistic corpus with semantic ontologies, considering the characteristics of cyber-terrorism, which was analyzed with the algorithms, Random Forests (parallel), Boosting, SVM, neural network, K-nn and Bayes. The results reveal a percentage of the 95.62% accuracy in the detection of the vocabulary of cyber-terrorism, which were approved through cross validation, reaching 576% time savings with parallel processing.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.