Article Preview
TopIntroduction
The rapid growth of technology has led to information overload from online such as blogs (Chen, Tsai, & Chan, 2007), social networks (Tsai, Han, Xu, & Chua, 2009), mobile information (Tsai et al., 2010), and Web services (Tsai et al., 2010). Novelty mining can help solve the problem of information overload by retrieving novel yet relevant information, based on a topic given by the user (Ng, Tsai, & Goh, 2007; Ong, Kwee, & Tsai, 2009), and can be used to solve many business problems, such as in corporate intelligence (Tsai, Chen, & Chan, 2007) and cyber security (Tsai, 2009; Tsai & Chan, 2007). Although users can retrieve all the novel documents, each document still needs to be read to find the novel sentences within these documents (Tsai & Chan, 2011). Therefore, to serve users better, later studies of novelty mining were performed at the sentence level (Kwee, Tsai, & Tang, 2009; Tang & Tsai, 2009; Tang, Tsai & Chen, 2010; Tsai, Tang, & Chan, 2010; Zhang & Tsai, 2009b). Furthermore, the Web is changing from a datacentric Web into Web of semantic data and Web of services (Yee, Tiong, Tsai, & Kanagasabai, 2009). The use of Web services has significance in the business domain, where they are used as means of communication or exchanging data between businesses and clients (Kwee & Tsai, 2009).
Previous studies on social media mining (Tsai, Chen, & Chan, 2008; Liang, Tsai, & Kwee, 2009) use existing Web and text mining techniques without consideration of the additional dimensions present in the social media. Because of this, the techniques are only able to analyze one or two dimensions of the blog data (Tsai & Chan, 2010). In this paper, we propose unsupervised probabilistic models for mining the multiple dimensions present in social media. The models are used in the novel social media classification framework, which categorizes social media according to their most likely topic.
Problem Definition
This paper addresses the problem of multidimensional social media mining, which is a big challenge in the data mining community. Although blogs may share many similarities to Web and text documents, existing techniques need to be reevaluated and adapted for the multidimensional representation of blog data, which exhibit attributes not present in traditional documents. The proposed techniques aim to leverage multiple blog dimensions of authors and links to improve the results of mining information from blog data and to address and solve the problem of mining information from blog data using multiple dimensions of social media.
TopRelated work on social media mining include techniques that focus on sentiment or opinion mining, or judging whether a particular blog post is negative, positive, or neutral to a particular object. One of the main tasks in the Text Retrieval Conference (TREC) Blog Track was the Opinion Retrieval Task, which involved finding blog posts that express an opinion about a given topic (Ounis et al., 2006; Macdonald, Ounis, & Soboroff, 2007).