Exploring Semantic Characteristics of Socially Constructed Knowledge Repository to Optimize Web Search

Exploring Semantic Characteristics of Socially Constructed Knowledge Repository to Optimize Web Search

Dengya Zhu (Curtin University, Australia) and Heinz Dreher (Curtin University, Australia)
DOI: 10.4018/978-1-4666-2494-8.ch013
OnDemand PDF Download:
List Price: $37.50


Short-term queries preferred by most users often result in a list of Web search results with low precision from a user perspective. The purpose of this research is to improve the relevance of Web search results via search-term disambiguation and ontological filtering of search results based on socially constructed search concepts. A Special Search Browser (SSB) is developed where semantic characteristics of the socially constructed knowledge repository are extracted to form a category-document set. kNN is employed with the extracted category-documents as training data to classify Web results. Users’ selected categories are employed to present the search results. Experimental results based on five experts’ judgments over 250 hits from Yahoo! API demonstrate that utilizing the socially constructed search concepts to categorize and filter search results can improve precision by 23.5%, from Yahoo’s 41.7% to 65.2% of SSB based on the results of five selected ambiguous search-terms.
Chapter Preview


The introduction and subsequent explosion of the Web has dramatically changed our approach to access and use of information. Internet is becoming a part of life for most people in the world. However, as indicated by Baeza-Yates and Ribeiro-Neto (1999), most users have difficulties in expressing their information needs in search-term format: they prefer short queries instead of the Boolean expressions (Jansen & Spink, 2006). To address this issue, most search engines encourage users to enter very short search terms as queries, and then return a list of search results which are ranked by technologies such as traditional information retrieval models and PageRank (Page, et al., 1998), according to the relevance degree of the results with respect to a given query. However, as the volume of information on the Web is becoming unbelievably huge, short search terms based Web search usually leads to search engines return a list of thousands, even millions of search results. Searchers are thus frustrated when facing such a long list of results especially when half of the search results are irrelevant to their information needs (Gauch, Chaffee, & Pretschner, 2003). It is now commonly recognized that information search services are far from perfect. The challenges of search engines are summarized in Table 1.

Table 1.
Challenges of web search engines (Zhu, 2007)
C1Information overloadMillions of Web hits
C2Mismatching hitsHigh recall, low precision, many irrelevant results
C3Flat list of resultsResults are presented in a flat list, users have to pick up useful items among the list, like finding a needle in haystack
C4Mismatching mental modelAutomatically formed hierarchy used to re-organize Web hits usually mismatches human mental model
C5HomogeneitySearch engines present “the same for all” hits, not personalized
C6Low recall of Web navigationWeb navigation is more accurate, but the recall is very low

Complete Chapter List

Search this Book: