Supervised Sentiment Analysis of Science Topics: Developing a Training Set of Tweets in Spanish

Supervised Sentiment Analysis of Science Topics: Developing a Training Set of Tweets in Spanish

Patricia Sánchez-Holgado, Carlos Arcila-Calderón
Copyright: © 2020 |Pages: 15
DOI: 10.4018/JITR.2020070105
(Individual Articles)
No Current Special Offers


Twitter is one of the largest sources of real-time information on the Internet and is continuously fed by millions of users around the world. Each of these users publishes text messages with their opinions, concerns, information, or simply their daily happenings. It is a challenge to address the analysis of massive data in the network, just as it is an objective to look for ways to understand everything that data can offer today in terms of knowledge of society and the market. The sector of science communication is still discovering everything that the web 2.0 and social networks can offer to reach all audiences. This article develops a classification model of messages launched on Twitter, on science topics, in Spanish, with machine learning techniques. The training of this type of models requires the creation of a specific corpus in Spanish for the subject of science, which is one of the most laborious tasks. The classifier is able to predict the sentiment of the message in real time on Twitter, with a confidence interval greater than 80%. The results of its evaluation are at 72% accuracy.
Article Preview

Context And Motivation

There is a growing interest in the study of public opinions using large-scale data produced by social media (Bollen, Mao, & Pepe, 2011; O’Connor, Balasubramanyan, Routledge, & Smith, 2010; Whitman Cobb, 2015). However, most of these studies are based on manual classification or automated content analysis using dictionaries that label words (for example, giving a negative or positive a priori value to each word) (Feldman, 2013) and other approaches such as supervised machine learning or supervised machine learning (Vinodhini & Chandrasekaran, 2012) derived from artificial intelligence are scarce in communication research (Van Zoonen & Van der Meer, Toni, 2016), in the social sciences and in private consultancies in issues of public opinion, political studies and marketing. Additionally, new technological efforts are dedicated to gather the automated analysis of feelings based on machine learning with streaming or live streaming technologies, which are capable of producing a significant amount of data.

Three billion people around the world express their thoughts and opinions on a regular basis through social networks. Twitter is one of the most outstanding, characterized by being a microblogging service, since it brings together features of blog, instant messaging and social network, growing exponentially since its launch in 2006. Twitter users generate content based on short texts of a maximum of 280 characters (up to November 2017, 140 characters were allowed), on any topic and in real time. Most of the messages are public, although it offers the possibility of sending private messages. A message or tweet can reach a very high audience in a few minutes thanks to the fact that users share the messages again in an endless network.

The total number of active monthly users at the beginning of 2018 already reached 330 million. This means a volume of about 500 million tweets per day. In Spain the number of users is close to 5 million.1

It is used to share information and to describe any daily activity (Java, Song, Finin, & Tseng, 2007), it allows expressing opinions and interests in real time (García Esparza, O’Mahony, & Smyth, 2012), its influence is observed in that it is present in practically all areas of social, political, economic, educational life and any subject (sports, culture, leisure, science, industry, etc.) (Kwak, Lee, Park, & Moon, 2010).

Complete Article List

Search this Journal:
Volume 16: 1 Issue (2023): Forthcoming, Available for Pre-Order
Volume 15: 6 Issues (2022): 1 Released, 5 Forthcoming
Volume 14: 4 Issues (2021)
Volume 13: 4 Issues (2020)
Volume 12: 4 Issues (2019)
Volume 11: 4 Issues (2018)
Volume 10: 4 Issues (2017)
Volume 9: 4 Issues (2016)
Volume 8: 4 Issues (2015)
Volume 7: 4 Issues (2014)
Volume 6: 4 Issues (2013)
Volume 5: 4 Issues (2012)
Volume 4: 4 Issues (2011)
Volume 3: 4 Issues (2010)
Volume 2: 4 Issues (2009)
Volume 1: 4 Issues (2008)
View Complete Journal Contents Listing