Based on The Document-Link and Time-Clue Relationships Between Blog Posts to Improve the Performance of Google Blog Search

Based on The Document-Link and Time-Clue Relationships Between Blog Posts to Improve the Performance of Google Blog Search

Lin-Chih Chen
Copyright: © 2019 |Pages: 24
DOI: 10.4018/IJSWIS.2019010103
(Individual Articles)
No Current Special Offers


Both the blog search engine and the general search engine automatically crawl the pages from the web and produce relevant search results based on the user's query. The first difference between the two types is that the blog search engine focuses on dealing with blog posts and filters out other types of pages. This difference allows bloggers only to care about posts rather than all pages that are indexed by general search engines. The second difference is the post, considering more time-related issues compared to the page. The semantic analysis model is widely used to analyze the various semantic relationships that may arise in the document. In this article, the authors propose a new semantic analysis model to find possible time relationships between posts. The main contribution of this paper has two points: first is that this paper builds a high-performance search system that considers the discussion topic and updated time between posts; second, is that the authors consider the time relationships between posts that can rank the relevant blog topics based on the popularity of the posts.
Article Preview

1. Introduction

An important application of Web 2.0 is a blog, which is an often-updated personal diary or online journal, and chronologically arranges the relevant information (O'Leary, 2011). It provides a platform that allows bloggers to write posts based on related events and subscribers can read or comment on related posts. The use of posts as a communication tool between bloggers and subscribers has the following two advantages: (1) it strengthens the depth of comments; (2) it strengthens the loyalty of the blog community (Nardi, Schiano, Gumbrecht, & Swartz, 2004).

From the 1990s to 2015, the number of posts on the Internet grew from thousands to millions and its growth rate was growing exponentially (Pingdom, 2015; Prayiush, 2015). An effective way to search for many posts is to use blog search engines to help bloggers find useful posts. One of the differences between general search engines and blog search engines is that they index different objects (Thelwall & Hasler, 2007; Tsai, 2011). One of the benefits of using a blog search engine is that bloggers only need to focus on posts rather than all pages, so they can reduce the time needed for judgments (Mishne & De Rijke, 2006). Another difference for both types of search engines considers different time factors (Han, Shin, Jung, & Park, 2009). In general, the general search engine often only shows the last updated time of the page content. However, the blog search engine can display post content with different updated times. Another benefit of using a blog search engine is that bloggers can understand the history of the post, so they can respond quickly (Nakajima, Zhang, Inagaki, Kusano, & Nakamoto, 2009).

According to the relevant literature analysis and statistics, the average length of the user query is about 3.08 words (Taghavi, Patel, Schmidt, Wills, & Tew, 2012). It is difficult for blog search engines to define precisely what users want to search based on these short queries. Short queries usually face two semantic problems: one for synonym (different terms have the same meaning) and the other for the polysemy (a term has different meanings) (Mishne & De Rijke, 2006; Thelwall & Hasler, 2007). Regardless of the synonym or polysemy query, the user needs to add or adjust their query to find a suitable post. However, it may be long and annoying to adjust the query for users who are unfamiliar with search engine features.

Latent semantic analysis model (later referred to as semantic model) is widely used to find latent semantic relationships between terms. In recent years, some well-known semantic models such as Latent Semantic Analysis (LSA), Probabilistic LSA (PLSA) and Latent Dirichlet Allocation (LDA) have been shown to solve effectively the synonymy and polysemy problems between terms (Blei, Ng, & Jordan, 2003; Fu, Qin, & Liu, 2015; Hofmann, 2001; Thomas K Landauer, Peter W Foltz, & Darrell Laham, 1998). However, these semantic models do not effectively identify the semantic relationships between documents (Fu et al., 2015). This problem is important because there may be the same discussion topic between similar documents. In addition, these semantic models cannot effectively deal with any time-related problems because they do not consider any time parameters (Yuan, Cong, Ma, Sun, & Thalmann, 2013). This problem is also important because the topic between documents will change over time.

Complete Article List

Search this Journal:
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing