Historic Perspective of Log Analysis

Historic Perspective of Log Analysis

W. David Penniman (Nylink, USA)
Copyright: © 2009 |Pages: 21
DOI: 10.4018/978-1-59904-974-8.ch002
OnDemand PDF Download:


This historical review of the birth and evolution of transaction log analysis applied to information retrieval systems provides two perspectives. First, a detailed discussion of the early work in this area, and second, how this work has migrated into the evaluation of World Wide Web usage. The author describes the techniques and studies in the early years and makes suggestions for how that knowledge can be applied to current and future studies. A discussion of privacy issues with a framework for addressing the same is presented as well as an overview of the historical “eras” of transaction log analysis. The author concludes with the suggestion that a combination of transaction log analysis of the type used early in its application along with additional more qualitative approaches will be essential for a deep understanding of user behavior (and needs) with respect to current and future retrieval systems and their design.
Chapter Preview

Introduction: General Perspecive And Objectives Of Chapter

This chapter is not an evaluation of current practice, but rather a look at the history of transaction logs and their evolution as a tool for studying user interaction. Much has been written about this tool, but there were just a few researchers who introduced this as a tool to study user interaction. This chapter is dedicated to those individuals (with apologies to any who are not cited, but were using this tool before it became well known and evident in the literature). At the same time, praise must be given to those who followed and assured that transaction log analysis evolved to the state it is at today, with a rich new “laboratory” represented by the Internet and the World-Wide Web.

Within this chapter, a variety of authors and studies are sampled to give a sense of the way in which transaction logs were first applied, how the study of on-line public access catalogs (OPACs) contributed to the evolution of transaction log analysis (and vice versa), and how particular projects (such as OPAC studies by the Council on Library Resources (CLR) and “IIDA” funded by the National Science Foundation (NSF) contributed to our understanding of user interaction. Previous surveys cited in the following paragraphs and sections of this chapter are drawn from as well as the author’s own experience with transaction log analysis in the early days of its application.

As stated by Peters, Kurth, Flaherty, Sandore, and Kaske (1993, p.38):

Researchers most often use transaction logs data with the intention of improving the IR system, human utilization of the system, and human (and perhaps also system) understanding of how the system is used by information seekers. Transaction log analysis can provide system designers and managers with valuable information about how the system is being used by actual users. It also can be used to study prototype system improvements.

Penniman (1975a, p. 159) in one of the early studies using transaction logs stated, “The promise (of transaction logs) is unlimited for evaluating communicative behavior where human and computer interact to exchange information.”

The promise of analyzing transaction logs has always been at least twofold: first to describe what users actually do while interacting with a system and second, to use this understanding to predict what should be the next actions they might take to use the system effectively (or to correct a difficulty they have encountered). Transaction logs continue to offer promise in both of these areas. The arena, in which this tool can be applied, however, is much larger. We now have the world (or at least the World-Wide Web) as a laboratory.


Background: Information Retrieval Goes Online

In the late 1960’s, before there was the Internet, there were a handful of online information retrieval system providers clamoring for attention (and a user base). Most systems had sprung from government-funded projects or were intended to serve such projects. Users were often restricted to a single proprietary system, and the competition was fierce to market the “best” system where most, in fact, appeared quite similar in features and functions (Walker, 1971; Gove, 1973). The ultimate system was yet to be, and still has not been, designed. If it were, it would certainly have the features so well articulated by Goodwin (1959) when retrieval was primarily a manual process or at best used batch-processing search software on large mainframes with extensive human intervention between end-user and information source. It was within this environment that Goodwin articulated the features of an “ideal” retrieval system as one in which the user would receive desired information:

  • At the time it is needed (not before or after)

  • In the briefest possible form

  • In order of importance

  • With necessary auxiliary information

  • With reliability of information clearly indicated (which implies some critical analysis)

  • With the source identified

  • With little or no effort (i.e. automatically)

  • Without clutter (undesired or untimely information eliminated)

  • With assurance that no response means the information does not exist

Key Terms in this Chapter

Stochastic Process: A process that is probabilistic rather than deterministic in behavior. In the current context, a user state can be estimated but not determined with certainty when a sequence of previous states is available (e.g. a partial transaction log)

Transaction Log Analysis: The study of electronically recorded interactions between online information retrieval systems and the persons who search for information found in those systems (Peters, et al 1993, p. 38 – narrow definition as applied to library and information science research)

Protocol Analysis: The systematic evaluation of protocols using automated or manual content analysis tools. (Penniman and Dominick 1980, p. 31)

Protocol: In this domain, a protocol is the “verbatim” record of user/system interaction for the entire user session (or selected portions) generally with time stamps on each action and perhaps some indication of system resources in use at the time. (Penniman and Dominick 1980, p. 23)

Markov Process: A stochastic process in which the transition probabilities can be estimated on the basis of first order data. Such a process is also stationary in that probability estimates do not change across the sample (generally across time)

Search Engine: A software program that searches one or more databases and gathers the results related to the search query

Transaction: A two-item set consisting of a query and a response, in which the IR system contributes either the query or the response and in which the response may be null. This definition allows human-to-machine, machine-to-human, and machin-to-machine transactions. It also allows for unanswered queries. (Peters, et al 1993, p. 39)

Analysis –Zero Order: An analysis of transactions in which only the current state is evaluated. This is usually characterized by studies in which frequency counts of particular states are reported irrespective of their context.

Analysis – Higher Order: An analysis of transaction patterns in which a sequence of states greater than two are evaluated and the current state is predicted on the basis of previous states (for example, a second-order process analysis would look at two previous states to predict the current state, a third order would look at three previous states, and so forth)

Analysis – First Order: An analysis of transaction patterns in which state pairs are evaluated and the immediately previous state is used to predict the current state

Adaptive Prompting: A context sensitive method of issuing diagnostics based on patterns of actions as well as individual actions by the user (Penniman 1976, p. 3)

Transaction Log: An autonomous file (or log) containing records of the individual transactions processed by a computerized IR system. (Source: Peters, et al. 1993, p. 39)

Complete Chapter List

Search this Book:
Table of Contents
Bernard J. Jansen, Amanda Spink, Isak Taksa
Chapter 1
Bernard J. Jansen, Isak Taksa, Amanda Spink
This chapter outlines and discusses theoretical and methodological foundations for transaction log analysis. We first address the fundamentals of... Sample PDF
Research and Methodological Foundations of Transaction Log Analysis
Chapter 2
W. David Penniman
This historical review of the birth and evolution of transaction log analysis applied to information retrieval systems provides two perspectives.... Sample PDF
Historic Perspective of Log Analysis
Chapter 3
Lee Rainie, Bernard J. Jansen
Every research methodology for data collection has both strengths and limitations, and this is certainly true for transaction log analysis.... Sample PDF
Surveys as a Complementary Method for Web Log Analysis
Chapter 4
Sam Ladner
This chapter aims to improve the rigor and legitimacy of Web-traffic measurement as a social research method. I compare two dominant forms of... Sample PDF
Watching the Web: An Ontological and Epistemological Critique of Web-Traffic Measurement
Chapter 5
Kirstie Hawkey
This chapter examines two aspects of privacy concerns that must be considered when conducting studies that include the collection of Web logging... Sample PDF
Privacy Concerns for Web Logging Data
Chapter 6
Bernard J. Jansen
Exploiting the data stored in search logs of Web search engines, Intranets, and Websites can provide important insights into understanding the... Sample PDF
The Methodology of Search Log Analysis
Chapter 7
Anthony Ferrini, Jakki J. Mohr
As the Web’s popularity continues to grow and as new uses of the Web are developed, the importance of measuring the performance of a given Website... Sample PDF
Uses, Limitations, and Trends in Web Analytics
Chapter 8
Danielle Booth
This chapter is an overview of the process of Web analytics for Websites. It outlines how visitor information such as number of visitors and visit... Sample PDF
A Review of Methodologies for Analyzing Websites
Chapter 9
Gi Woong Yun
This chapter discusses validity of units of analysis of Web log data. First, Web log units are compared to the unit of analysis of television to... Sample PDF
The Unit of Analysis and the Validity of Web Log Data
Chapter 10
Kirstie Hawkey, Melanie Kellar
This chapter presents recommendations for reporting context in studies of Web usage including Web browsing behavior. These recommendations consist... Sample PDF
Recommendations for Reporting Web Usage Studies
Chapter 11
Seda Ozmutlu, Huseyin C. Ozmutlu, Amanda Spink
This chapter summarizes the progress of search engine user behavior analysis from search engine transaction log analysis to estimation of user... Sample PDF
From Analysis to Estimation of User Behavior
Chapter 12
Gheorghe Muresan
In this chapter, we describe and discuss a methodological framework that integrates analysis of interaction logs with the conceptual design of the... Sample PDF
An Integrated Approach to Interaction Design and Log Analysis
Chapter 13
Brian Detlor, Maureen Hupfer, Umar Ruhi
This chapter provides various tips for practitioners and researchers who wish to track end-user Web information seeking behavior. These tips are... Sample PDF
Tips for Tracking Web Information Seeking Behavior
Chapter 14
Sandro José Rigo
Adaptive Hypermedia is an effective approach to automatic personalization that overcomes the difficulties and deficiencies of traditional Web... Sample PDF
Identifying Users Stereotypes for Dynamic Web Pages Customization
Chapter 15
Brian K. Smith, Priya Sharma, Kyu Yon Lim, Goknur Kaplan Akilli, KyoungNa Kim, Toru Fujimoto
Computers and networking technologies have led to increases in the development and sustenance of online communities, and much research has focused... Sample PDF
Finding Meaning in Online, Very-Large Scale Conversations
Chapter 16
Isak Taksa, Sarah Zelikovitz, Amanda Spink
Search query classification is a necessary step for a number of information retrieval tasks. This chapter presents an approach to non-hierarchical... Sample PDF
Machine Learning Approach to Search Query Classification
Chapter 17
Seda Ozmutlu, Huseyin C. Ozmutlu, Amanda Spink
This chapter emphasizes topic analysis and identification of search engine user queries. Topic analysis and identification of queries is an... Sample PDF
Topic Analysis and Identification of Queries
Chapter 18
Elmer V. Bernstam, Jorge R. Herskovic, William R. Hersh
Clinicians, researchers and members of the general public are increasingly using information technology to cope with the explosion in biomedical... Sample PDF
Query Log Analysis in Biomedicine
Chapter 19
Michael Chau, Yan Lu, Xiao Fang, Christopher C. Yang
More non-English contents are now available on the World Wide Web and the number of non-English users on the Web is increasing. While it is... Sample PDF
Processing and Analysis of Search Query Logs in Chinese
Chapter 20
Udo Kruschwitz, Nick Webb, Richard Sutcliffe
The theme of this chapter is the improvement of Information Retrieval and Question Answering systems by the analysis of query logs. Two case studies... Sample PDF
Query Log Analysis for Adaptive Dialogue-Driven Search
Chapter 21
Mimi Zhang
In this chapter, we present the action-object pair approach as a conceptual framework for conducting transaction log analysis. We argue that there... Sample PDF
Using Action-Object Pairs as a Conceptual Framework for Transaction Log Analysis
Chapter 22
Paul DiPerna
This chapter proposes a new theoretical construct for evaluating Websites that facilitate online social networks. The suggested model considers... Sample PDF
Analysis and Evaluation of the Connector Website
Chapter 23
Marie-Francine Moens
This chapter introduces information extraction from blog texts. It argues that the classical techniques for information extraction that are commonly... Sample PDF
Information Extraction from Blogs
Chapter 24
Adriana Andrade Braga
This chapter explores the possibilities and limitations of nethnography, an ethnographic approach applied to the study of online interactions... Sample PDF
Nethnography: A Naturalistic Approach Towards Online Interaction
Chapter 25
Isak Taksa, Amanda Spink, Bernard J. Jansen
Web log analysis is an innovative and unique field constantly formed and changed by the convergence of various emerging Web technologies. Due to its... Sample PDF
Web Log Analysis: Diversity of Research Methodologies
About the Contributors