Public health surveillance has gained more importance recently due the global COVID-19 pandemic. It is important to track public opinions and positions on social media automatically, so that this information can be used to improve public health. Sentiment analysis and stance detection are two social media analysis methods that can be applied to health-related social media posts for this purpose. In this chapter, the authors perform sentiment analysis and stance detection in Turkish tweets about COVID-19 vaccination. A sentiment- and stance-annotated Turkish tweet dataset about COVID-19 vaccination is created. Different machine learning approaches (SVM and Random Forest) are applied on this dataset, and the results are compared. Widespread COVID-19 vaccination is claimed to be useful in order to cope with this pandemic. Therefore, results of automatic sentiment and stance analysis on Twitter posts on COVID-19 vaccination can help public health professionals during their decision-making processes.
TopIntroduction
The novel Coronavirus Disease 2019 (COVID-19) pandemic is a global pandemic that has affected many people and countries. Many studies are being published in medical journals based on patient examinations. People are also expressing their ideas and opinions about the pandemic on social media like Twitter, Facebook, Instagram, and Reddit. There are also studies that publish social media datasets about the COVID-19 pandemic. In addition to the pandemic itself, social media users are expressing their opinions about various aspects of the pandemic as well. For instance, they are expressing their ideas about using face masks, about social distancing, COVID-19 vaccines, remote working and online education in order to prevent the spread of the disease.
Automatic social media analysis can be used to extract useful information from the related datasets and social media platforms about the pandemic. In several studies, it is pointed out that social media analysis can facilitate automatic public health surveillance and can provide useful and timely information for public health professionals (Küçük et al., 2017; Edo-Osagie et al., 2020; Küçük et al., 2021).
Sentiment analysis is a social media analysis method and it is also commonly known as opinion mining in the related literature (Liu, 2010; Agarwal et al., 2011; Sun et al., 2017). At the end of the sentiment analysis process, the input text is generally classified as Positive, Negative, or Neutral (Liu, 2010; Agarwal et al., 2011; Sun et al., 2017). In several related studies, None (or, Neither) is also added to the list of sentiment class labels.
Stance detection is another social media analysis problem like sentiment analysis. Stance detection classifies stance (position) of the input text (like tweets) towards a given target. The input text is usually classified as Favor, Against, or None (Neither) at the end of the stance detection procedure (Mohammad et al., 2016a; Mohammad et al., 2016b; Küçük & Can, 2020). Stance detection is also known as stance prediction, stance analysis, stance classification, and stance identification in the related literature. In some studies, Neutral is also used as a stance class.
In this book chapter, we create a Turkish tweet dataset on COVID-19 vaccination. We first annotate this dataset with sentiment labels (Positive, Negative, None) and stance labels (Favor, Against, None). Next, we perform sentiment analysis and stance detection on this dataset using different machine learning approaches. These automatic sentiment and stance classification results can be used by public health experts. Contributions of our study are listed below:
- 1.
To the best of our knowledge, we present the first Turkish tweet dataset about COVID-19 vaccination which is annotated with both sentiment and stance classes. The dataset can be used by sentiment analysis and stance detection researchers. The dataset has been annotated with the common polarity classes of “Positive”, “Negative”, and “None”, as well as with the common stance classes of “Favor”, “Against”, and “None”. Previous work on stance detection on Turkish tweets has only considered the two stance classes: Favor and Against classes (binary classification) (Küçük & Can, 2020). In our work, multi-class stance classification towards COVID-19 vaccination (target) is performed using our dataset labeled with three stance classes.
- 2.
Two different machine learning approaches (SVM and Random Forest) are tested on this dataset and their performance results are compared. These learning approaches are selected because of their common use by the related work on stance detection and sentiment analysis. The dataset and the corresponding test results can be used for research purposes by other researchers as test dataset and baseline system for comparison, respectively.