Semantic Characterization of Tweets Using Topic Models: A Use Case in the Entertainment Domain

Semantic Characterization of Tweets Using Topic Models: A Use Case in the Entertainment Domain

Andrés García-Silva (Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain), Víctor Rodríguez-Doncel (Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain) and Oscar Corch (Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain)
Copyright: © 2013 |Pages: 13
DOI: 10.4018/ijswis.2013070101


In the entertainment domain users tweet about their expectations and opinions regarding upcoming, current and past experiences, while companies advertise and promote the shows. This characterization, important for customers and companies, goes beyond traditional sentiment analysis where the polarity of the sentiments expressed in opinions is usually identified as positive, negative or neutral. The authors investigate different tweet representation models, including bags of words and probabilistic topic models, to shed light on the semantics of the messages. Their experiments show that topic-based models generated with Latent Dirichlet Allocation (LDA) yield, most of the times, better categorizations when compared to TF-IDF based features, particularly when these models are enriched with natural language features and specific Twitter slang.
Article Preview


Interest for analyzing social media has gained a large attention in the last few years, as it has proved to be a valid tool to pulse the sentiment of the masses towards commercial brands, political options, public affairs, etc. Profiling the consumers’ attitude by scanning social media is a biased measure of the mass opinion addressing a particular stratum of the population, but it is a cheap, easy and fast signal which can be used to timely modify marketing campaigns, pricing policies and the communication strategy of anybody with a public face.

Twitter is an optimal candidate to be studied for its large number of users, ubiquitous availability and messages heterogeneity. Much work has been done in the last few years in the akin fields of opinion mining and sentiment analysis in Twitter. The general motivation of twitters and the very nature of the messages was very well established in Java, Song, Finin, and Tseng (2007) and Krishnamurthy, Gill, and Arlitt (2008), with the relevant focus in the diffusion of word of mouth opinions studied in Jansen, Zhang, Sobel, and Chowdury (2009). Sentiment analysis algorithms proposed by the academia have actually materialized in a bunch of applications now in use by market analysts, community managers and social researchers in general. These tools (TweetFeel1, Twendz2, Sentiment1403, Social Mention4, Twitometro5 etc.) usually provide a polarity figure measuring the attitude towards a brand or any other queried topic.

In the sector of plays and musicals (which generates a gross sale of £500M a year6 in London, selling 250,000 tickets alone in musicals), opinion mining is of particular relevance, as performances are a much communicated act. The whole entertainment industry has a strong dependence on the public opinion, and mouth to mouth advertisement is crucial for the success among the whole entertainment offer. Tweets’ analysis have proved to anticipate box-office revenues even by merely observing the number of references to a movie per day (Asur & Huberman, 2010), and it has been related to the ratings given in IMDB (Oghina, Breuss, Tsagkias, & de Rijke, 2012).

Opinions published in Twitter on shows can also be mined to evaluate how it has been received by the public. These messages are precisely dated, but the absolute time is not as important as determining whether it was issued before or after the emitter watched the event. Furthermore, separating true opinions from advertisement —made by community managers or other interested parties, is instrumental to get a reliable opinion assessment. Determining whether a tweet refers to a future event or to a past event, relative to the user’s experience, is a field that needs more attention, and not many efforts have been made in sorting out which of the messages actually carry an opinion about a lived event, express an expectation about a future event or convey any form of atemporal advertisement. As an example the tweet “Richard III at the globe was great. Samuel Barnett and Janes Garnon were fantastic but need I say that Mark Rylance was sublime” is a typical opinion, “Going to see the wonderful Mark Rylance at The Globe Thursday, very excited” an exemplary expectation Tweet and “Three last chances to enjoy acclaimed #RichardIII this wkend” an advertisement on the same work.

This classification may benefit customers and companies in this domain. Expectations would help to increase the hype of shows, and opinions can be used to motivate undecided customers, while ads of the competitors can be filtered out so that they do not reach the company customers base or they can be aggregated so that customers can pick the best offer.

Complete Article List

Search this Journal:
Open Access Articles
Volume 16: 4 Issues (2020): 1 Released, 3 Forthcoming
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing