Time-Varying Dynamic Topic Model: A Better Tool for Mining Microblogs at a Global Level

Time-Varying Dynamic Topic Model: A Better Tool for Mining Microblogs at a Global Level

Jun Han (Beihang University, Beijing, China), Yu Huang (Beihang University, Beijing, China), Kuldeep Kumar (Bond University, Gold Coast, Australia) and Sukanto Bhattacharya (Deakin Business School, Deakin University, Geelong, Australia)
Copyright: © 2018 |Pages: 16
DOI: 10.4018/JGIM.2018010106
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

In this paper the authors build on prior literature to develop an adaptive and time-varying metadata-enabled dynamic topic model (mDTM) and apply it to a large Weibo dataset using an online Gibbs sampler for parameter estimation. Their approach simultaneously captures the maximum number of inherent dynamic features of microblogs thereby setting it apart from other online document mining methods in the extant literature. In summary, the authors' results show a better performance of mDTM in terms of the quality of the mined information compared to prior research and showcases mDTM as a promising tool for the effective mining of microblogs in a rapidly changing global information space.
Article Preview

1. Introduction

The global information space is a rapidly morphing one where newer, faster and more individualized modes of person-to-person electronic information transmittance are emerging on a fairly regular basis. Microblogs is one such mode of global information transmittance that is becoming very popular in spreading and sharing breaking news, personal updates and spontaneous ideas. Microblogs such as Weibo and Twitter allow users to exchange small packets of information content and reflect the general public’s reactions to major events occurring around the globe. Due to an explosion of online data generated via social networking, a vast amount of user generated content has accumulated on the popular social networking websites. This has spawned a big demand for automated text mining models to delve into this large online collection of short textual elements. Blei, Ng & Jordan (2003) showed for the first time that statistical admixture topic models were a promising text mining tool in this regard.

Topic models are a generative probabilistic model, which posits that each topic (theme) can generate words according to certain probability and each document or collection of data is assumed to be a fixed-dimensional mixture model of topics. Dirichlet distribution is usually used to model the variability in the topic mixing vector of the documents and the word mixing vector of the topics, although other alternatives have been explored in the literature (Li & McCallum, 2006; Blei & Lafferty, 2007). Standard topic models such as latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora (Blei, Ng & Jordan, 2003), are usually applied to model long textual documents and not suitable for microblog posts as these are short, noisy and highly correlated with the authors. Intuitively, posts published by the same user have a higher probability to belong to the same topic. Rosen-Zvi, Griffiths, Steyvers & Smyth (2004) expanded topic distributions from document-level to user-level to include authorship information. Based on it, Zhao et al. (2011) proposed twitter-LDA (twitter latent Dirichlet allocation), assuming that each post is assigned a single topic and some words can be background words. Diao, Jiang, Zhu & Lim (2012) further improved the twitter-LDA presenting the TimeUserLDA (time and user latent Dirichlet allocation) to detect topics that arrived in bursts. However, all these studies took the number of topics to be fixed – thus they failed to capture the inherently dynamic characteristics of microblog posts.

To accommodate the temporal information of document collections, a number of temporal topic models have been proposed. Wang & McCallum (2006) developed a Topics over time (TOT) model and found trends in time-sensitive topics using a continuous distribution over time-stamps. However, the number of topics were fixed over time in the TOT model. Blei & Lafferty (2006) proposed a dynamic topic model (DTM), which focuses on the change of topic composition i.e. word distributions. But in this case too, the number of topics were fixed. Moreover, the authors assumed that the parameters are conditionally distributed normally with a mean equal to the corresponding parameter at the previous time instance. However, since the normal distribution is not a conjugate to the multinomial distribution, their model does not yield a simple solution to the problems of inference and estimation. Ahmed & Xing (2010) proposed an infinite dynamic topic model (iDTM) that can accommodate the evolution of all aspects of the latent structure such as the number of topics, distribution and popularity. The iDTM also addressed the birth and death of topics over a timeline following a Markovian assumption and was therefore a significant improvement over the preceding models. There are just a handful of other published works on temporal evolution of topics (Masada et al., 2009; Hong, Dom, Gurumurthy & Tsioutsiouliklis, 2011).

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 26: 4 Issues (2018): 1 Released, 3 Forthcoming
Volume 25: 4 Issues (2017)
Volume 24: 4 Issues (2016)
Volume 23: 4 Issues (2015)
Volume 22: 4 Issues (2014)
Volume 21: 4 Issues (2013)
Volume 20: 4 Issues (2012)
Volume 19: 4 Issues (2011)
Volume 18: 4 Issues (2010)
Volume 17: 4 Issues (2009)
Volume 16: 4 Issues (2008)
Volume 15: 4 Issues (2007)
Volume 14: 4 Issues (2006)
Volume 13: 4 Issues (2005)
Volume 12: 4 Issues (2004)
Volume 11: 4 Issues (2003)
Volume 10: 4 Issues (2002)
Volume 9: 4 Issues (2001)
Volume 8: 4 Issues (2000)
Volume 7: 4 Issues (1999)
Volume 6: 4 Issues (1998)
Volume 5: 4 Issues (1997)
Volume 4: 4 Issues (1996)
Volume 3: 4 Issues (1995)
Volume 2: 4 Issues (1994)
Volume 1: 4 Issues (1993)
View Complete Journal Contents Listing