Article Preview
TopIntroduction
The accelerated growth of the Internet, the use of social networks, cloud computing, have led to the generation of large volumes of data, in which the opportunity to commit a crime is latent. One of the greatest threats to society in the world today is cyber-terrorism, a new way of engaging in violence, which is executed by terrorist groups on the Internet, seeking to harm people, groups or nations (Alqahtani, 2015).
Allister et al. (2010), indicates that cyber-terrorism is the convergence of terrorism and cyberspace for unlawful attacks and threats aimed at damage to individuals, groups or nations, via ICTs; Sánchez (2015), mentions that it is a violent action that instills terror carried out by one or more people on the Internet or through the improper usage of communications technologies. From the perspective of Poveda & Torrente (2016), indicate that cyberterrorism is the deliberate usage of technologies related to computer science for threaten or attack people, as well as to property and infrastructure, in order to instill terror to achieve a political, ideological, social or religious purpose. Finally, Salellas (2012), describes that cyberspace is being used by terrorist groups such as the Al Qaeda, ETA in Spain, neo-Nazi groups from Belgium and the Netherlands, Supreme Truth in Japan, and KKK in United States, to carry out propaganda, financing, recruitment, collection and exchange of information.
In general information is essential against this threat, and also prevention measures will be determined by the difference of information between victims and cyberterrorists (Schenone, 2014). However, the process to analyze the enormous amount of data generated on the Internet and the possibility of identify possible cyberterrorism vocabulary has been addressed from different approaches. With this perspective there are programs, developments, algorithms and processes that are not well defined, implementing partial solutions that have not been fully accepted by the scientific community. For these reasons, this research proposes a new approach to process large volumes of data with the aim to identify cyberterrorism vocabulary.
To gain knowledge of the information and that this represents a value, it is necessary to carry out effective administration data and to apply different processing techniques that allow you to handle large volumes of information, with a speed of acceptable response. It is also possible to analyze a variety of complex data, semi-structured and unstructured as documents, images, videos, music, among others (Joyanes, 2013), with the purpose of obtaining accuracy in data on a theme in particular, including the Big Data characteristics in this way (Chawda & Thakur, 2016). On the other hand semantic problems should be considered as an obstacle for the interpretation of words or meanings, integration of scattered and unrelated information. Further the recovery of data that has problems of synonymy, polysemy and multilingualism (Pastor, 2013).
Recent reports Allister et al. (2010), Kolajo & Daramola (2017), Bosques & Garza (2016), Pu et al. (2015), Semberecki & Maciejewski (2016), Weir et al. (2016), and Sarnovsky & Vronc (2014), have aimed to the treatment of large volumes of data, the analysis of information on the Semantic Web and Natural Language Processing (NLP), which are summarized in Table 1.