Article Preview
Top1. Introduction
An important application of Web 2.0 is a blog, which is an often-updated personal diary or online journal, and chronologically arranges the relevant information (O'Leary, 2011). It provides a platform that allows bloggers to write posts based on related events and subscribers can read or comment on related posts. The use of posts as a communication tool between bloggers and subscribers has the following two advantages: (1) it strengthens the depth of comments; (2) it strengthens the loyalty of the blog community (Nardi, Schiano, Gumbrecht, & Swartz, 2004).
From the 1990s to 2015, the number of posts on the Internet grew from thousands to millions and its growth rate was growing exponentially (Pingdom, 2015; Prayiush, 2015). An effective way to search for many posts is to use blog search engines to help bloggers find useful posts. One of the differences between general search engines and blog search engines is that they index different objects (Thelwall & Hasler, 2007; Tsai, 2011). One of the benefits of using a blog search engine is that bloggers only need to focus on posts rather than all pages, so they can reduce the time needed for judgments (Mishne & De Rijke, 2006). Another difference for both types of search engines considers different time factors (Han, Shin, Jung, & Park, 2009). In general, the general search engine often only shows the last updated time of the page content. However, the blog search engine can display post content with different updated times. Another benefit of using a blog search engine is that bloggers can understand the history of the post, so they can respond quickly (Nakajima, Zhang, Inagaki, Kusano, & Nakamoto, 2009).
According to the relevant literature analysis and statistics, the average length of the user query is about 3.08 words (Taghavi, Patel, Schmidt, Wills, & Tew, 2012). It is difficult for blog search engines to define precisely what users want to search based on these short queries. Short queries usually face two semantic problems: one for synonym (different terms have the same meaning) and the other for the polysemy (a term has different meanings) (Mishne & De Rijke, 2006; Thelwall & Hasler, 2007). Regardless of the synonym or polysemy query, the user needs to add or adjust their query to find a suitable post. However, it may be long and annoying to adjust the query for users who are unfamiliar with search engine features.
Latent semantic analysis model (later referred to as semantic model) is widely used to find latent semantic relationships between terms. In recent years, some well-known semantic models such as Latent Semantic Analysis (LSA), Probabilistic LSA (PLSA) and Latent Dirichlet Allocation (LDA) have been shown to solve effectively the synonymy and polysemy problems between terms (Blei, Ng, & Jordan, 2003; Fu, Qin, & Liu, 2015; Hofmann, 2001; Thomas K Landauer, Peter W Foltz, & Darrell Laham, 1998). However, these semantic models do not effectively identify the semantic relationships between documents (Fu et al., 2015). This problem is important because there may be the same discussion topic between similar documents. In addition, these semantic models cannot effectively deal with any time-related problems because they do not consider any time parameters (Yuan, Cong, Ma, Sun, & Thalmann, 2013). This problem is also important because the topic between documents will change over time.