Exploiting the data stored in search logs of Web search engines, Intranets, and Websites can provide important insights into understanding the information searching tactics of online searchers. This understanding can inform information system design, interface development, and information architecture construction for content collections. This chapter presents a review of and foundation for conducting Web search transaction log analysis. A search log analysis methodology is outlined consisting of three stages (i.e., collection, preparation, and analysis). The three stages of the methodology are presented in detail with discussions of the goals, metrics, and processes at each stage. The critical terms in transaction log analysis for Web searching are defined. Suggestions are provided on ways to leverage the strengths and addressing the limitations of transaction log analysis for Web searching research.
Review Of Literature
What is a Search Log?
Not surprisingly, a search log is a file (i.e., log) of the communications (i.e., transactions) between a system and the users of that system. Rice and Borgman (1983) present transaction logs as a data collection method that automatically captures the type, content, or time of transactions made by a person from a terminal with that system. Peters (1993) views transaction logs as electronically recorded interactions between on-line information retrieval systems and the persons who search for the information found in those systems.
For Web searching, a search log is an electronic record of interactions that have occurred during a searching episode between a Web search engine and users searching for information on that Web search engine. A Web search engine may be a general-purpose search engine, a niche search engine, a searching application on a single Web site, or variations on these broad classifications. The users may be humans or computer programs acting on behalf of humans. Interactions are the communication exchanges that occur between users and the system. Either the user or the system may initiate elements of these exchanges.
How are These Interactions Collected?
The process of recording the data in the search log is relatively straightforward. Web servers record and store the interactions between searchers (i.e., actually Web browsers on a particular computer) and search engines in a log file (i.e., the transaction log) on the server using a software application. Thus, most search logs are server-side recordings of interactions. Major Web search engines execute millions of these interactions per day. The server software application can record various types of data and interactions depending on the file format that the server software supports.
Typical transaction log formats are access log, referrer log, or extended log. The W3C (http://www.w3.org/TR/WD-logfile.html) is one organizational body that defines transaction log formats. However, search logs are a special type of transaction log file. This search log format has most in common with the extended file format, which contains data such as the client computer’s Internet Protocol (IP) address, user query, search engine access time, and referrer site, among other fields.
Key Terms in this Chapter
Search Log: An electronic record of interactions that have occurred during a searching episode between a Web search engine and users searching for information on that Web search engine.
Interactions: The physical expressions of communication exchanges between the searcher and the system.
Search Log Analysis (SLA) Process: A three stage process of collection, preparation and analysis.
Search Log Analysis (SLA): The use of data collected in a search log to investigate particular research questions concerning interactions among Web users, the Web search engine, or the Web content during searching episodes.