The Perspectives of Improving Web Search Engine Quality
Jengchung V. Chen (National Cheng Kung University, Taiwan), Wen-Hsiang Lu (National Cheng Kung University, Taiwan), Kuan-Yu He (National Cheng Kung University, Taiwan) and Yao-Sheng Chang (National Cheng Kung University, Taiwan)
Copyright: © 2008
With the fast growth of the Web, users often suffer from the problem of information overload, since many existing search engines respond to queries with many nonrelevant documents containing query terms based on the conventional search mechanism of keyword matching. In fact, both users and search engine developers had anticipated that this mechanism would reduce information overload by understanding user goals clearly. In this chapter, we will introduce some past research in Web search, and current trends focusing on how to improve the search quality in different perspectives of “what”, “how”, “where”, “when”, and “why”. Additionally, we will also briefly introduce some effective search quality improvements using link-structure-based search algorithms, such as PageRank and HITS. At the end of this chapter, we will introduce the idea of our proposed approach to improving search quality, which employs syntactic structures (verb-object pairs) to automatically identify potential user goals from search-result snippets. We also believe that understanding user goals more clearly and reducing information overload will become one of the major developments in commercial search engines in the future, since the amounts of information and resources continue to increase rapidly, and user needs will become more and more diverse.
Key Terms in this Chapter
User Behavior: Users’ interaction with the search engine.
User Goal Identification: To identify what the user wants to do when submitting a query.
Ke yword Matching: A search mechanism which considers a document relevant if it shares common terms with the query.
Search: Quality: A nature of providing users with useful search results.
Natural Language Processing: A field of studying the problems of automated generation and understanding of natural human languages.
Click-Through Data: The information which can reveal the behavior of users from submitting a query to finally finding the target Web pages.
Information Retrieval: To retrieve information useful or relevant to the query.