Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clustering, log-based query expansion, collaborative filtering and personalized search, could be employed to improve the performance of Web search.
Web usage mining is an application of data mining techniques to discovering interesting usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Since the majority of usage data is stored in Web logs, usage mining is usually also referred to as log mining. Web logs can be divided into three categories based on the location of data collecting: server log, client log, and proxy log. Server log provides an aggregate picture of the usage of a service by all users, while client log provides a complete picture of usage of all services by a particular client, with the proxy log being somewhere in the middle (Srivastava, Cooley, Deshpande, & Tan, 2000).
Query log mining could be viewed as a special kind of Web usage mining. While there is a lot of work about mining Website navigation logs for site monitoring, site adaptation, performance improvement, personalization and business intelligence, there is relatively little work of mining search engines’ query logs for improving Web search performance. In early years, researchers have proved that relevance feedback can significantly improve retrieval performance if users provide sufficient and correct relevance judgments for queries (Xu & Croft, 2000). However, in real search scenarios, users are usually reluctant to explicitly give their relevance feedback. A large amount of users’ past query sessions have been accumulated in the query logs of search engines. Each query session records a user query and the corresponding pages the user has selected to browse. Therefore, a query log can be viewed as a valuable source containing a large amount of users’ implicit relevance judgments. Obviously, these relevance judgments can be used to more accurately detect users’ query intentions and improve the ranking of search results.
One important assumption behind query log mining is that the clicked pages are “relevant” to the query. Although the clicking information is not as accurate as explicit relevance judgment in traditional relevance feedback, the user’s choice does suggest a certain degree of relevance. In the long run with a large amount of log data, query logs can be treated as a reliable resource containing abundant implicit relevance judgments from a statistical point of view.