This chapter summarizes the progress of search engine user behavior analysis from search engine transaction log analysis to estimation of user behavior. Correct estimation of user information searching behavior paves the way to more successful and even personalized search engines. However, estimation of user behavior is not a simple task. It closely relates to natural language processing and human computer interaction, and requires preliminary analysis of user behavior and careful user profiling. This chapter details the studies performed on analysis and estimation of search engine user behavior, and surveys analytical methods that have been and can be used, and the challenges and research opportunities related to search engine user behavior or transaction log query analysis and estimation.
Search engines are the most important tools for reaching information over the Web and the effective use of search engines is a challenge (Liaw and Huang, 2006). Search engine query analysis and user behavior analysis through search engine queries is a very important task, since it is directly related to developing search engines with better performance and also personalized search engines. Analysis of user behavior is important in the sense that each service provider (and search engines are service providers) benefits from knowing its customer base and the way the customers use its services. Enhanced search engine structures and algorithms suitable for the search engine users can be developed after analyzing the behavior of the user base of the search engine.
In addition, a new trend in search engine research is the development of personalized search engines. Including personalization features into search engines has been recognized as a major research area (Liu, et al., 2004). Radlinski and Dumais (2006) state that personalizing search results for individual users is increasingly being recognized as an important future direction for searching. Agichtein, Brill, Dumai and Ragno (2006) state that accurate modeling and interpretation of user behavior have important applications to ranking, click spam detection, search personalization, and other tasks.
However, it is a real challenge to capture user information behavior, since people have different and changing information needs, and they utilize different information seeking strategies to solve their information seeking problems (Gremett, 2006). Many search studies at the human information behavior level explore the factors that influence search within the context of human information seeking (Spink and Jansen, 2004). Excellent reviews on searching exist, which we will point to within the chapter. It should also be mentioned that the chapter is restricted to studies on search engine transaction log analysis and search engine user behavior analysis and does not cover usage mining in general, which is a very wide topic.
However, it is not adequate to only analyze the user interactions with the search engine; it is also necessary to reflect the results of user query analysis to real-time information retrieval algorithms, which have estimation power of the users’ upcoming actions and transactions with the search engine. Along this direction, search engine transaction log analysis, and user behavior analysis have progressed from pure analysis of user queries to studies on estimation of content-based behavior of users, and development of personalized information retrieval algorithms.
This chapter provides the summary on the progress of search engine transaction log analysis and user behavior analysis to estimation of search engine user behavior. The chapter begins with a detailed literature review of search engine user behavior studies and continues with a detailed presentation of the methodologies used for analyzing search behavior. Then, the studies on the estimation of search use behavior will be summarized, along with the explanation of the methodologies used for these studies. The chapter is concluded with a discussion of future research opportunities.
Key Terms in this Chapter
Session Identification: Session identification is discovering the group of sequential log entries that are related to a common user or topic; new topic identification.
Analysis of Variance: Analysis of variance is a procedure, where the total variation in the dependent factor is partitioned into meaningful components (Walpole, Myers and Myers, 1998).
Monte-Carlo Simulation: Monte-Carlo simulation is a static simulation scheme that employs random numbers, and is used for solving stochastic or deterministic problems, where time plays no substantial role (Law and Kelton, 1991).
Support Vector Machines: Support vector machines is a methodology of statistical learning theory, which is based on generating functions from a set of labeled training data.
Regression: Regression is an approach that generates a model characterizing the relationship between independent and dependent factors of a system from sample data representing a certain observable fact.
Markov Models: Markov models or chains are a stochastic process that considers a finite number of values and states.
Neural Networks: A neural network is “a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use.” (Haykin, 1994).
Poisson Sampling: The Poisson sampling process is a useful random sampling process as it includes the properties of (1) Unbiased Sampling (2) Proportional Sampling (3) Comparability of Heterogeneous Poisson sampling Arrivals, and (4) Flexibility on the Stochastic Arrival Process From Which the Sample is Selected.
New Topic Identification: New topic identification is discovering when the user has switched from one topic to another during a single search session to group sequential log entries that are related to a common topic (He, Goker and Harper, 2002), session identification.