Privacy Concerns for Web Logging Data

Privacy Concerns for Web Logging Data

Kirstie Hawkey (University of British Columbia, Canada)
Copyright: © 2009 |Pages: 19
DOI: 10.4018/978-1-59904-974-8.ch005
OnDemand PDF Download:


This chapter examines two aspects of privacy concerns that must be considered when conducting studies that include the collection of Web logging data. After providing background about privacy concerns, we first address the standard privacy issues when dealing with participant data. These include privacy implications of releasing data, methods of safeguarding data, and issues encountered with re-use of data. Second, the impact of data collection techniques on a researcher’s ability to capture natural user behaviors is discussed. Key recommendations are offered about how to enhance participant privacy when collecting Web logging data so as to encourage these natural behaviors. The author hopes that understanding the privacy issues associated with the logging of user actions on the Web will assist researchers as they evaluate the tradeoffs inherent between the type of logging conducted, the richness of the data gathered, and the naturalness of captured user behavior.
Chapter Preview


Privacy is an important consideration when conducting research that utilizes Web logs for the capture and analysis of user behaviors. Two aspects of privacy will be discussed in this chapter. First, it is important that governmental regulations, such as the Personal Information Protection and Electronic Documents Act (PIPEDA) in Canada, or organizational regulations, such as a university’s local research ethics board (REB) policies, are met. These regulations will dictate requirements for the storage and safeguarding of participant data as well as the use, re-use, and transfer of that data. Secondly, researchers may also find that providing privacy enhancing mechanisms for participants can impact the success of a study. Privacy assurances can ease study recruitment and encourage natural Web browsing behaviors. This is particularly important when capturing rich behavioral data beyond that which is ordinarily recorded in server transaction logs, as is generally the case for client-side logging. It is this second aspect of privacy that will be the primary focus of this chapter.

There are privacy concerns associated with viewing and releasing Web browsing data. Web browsers are typically used for a wide variety of tasks, both personal and work related (Hawkey & Inkpen, 2006a). The potentially sensitive information that may be visible within Web browsers and in data logs is tightly integrated with a person’s actions within the Web browser (Lederer, Hong, Dey, & Landay, 2004). Increasingly the Internet has become a mechanism by which people can engage in activities to support their emotional needs such as surfing the Web, visiting personal support forums, blogging, and investigating health concerns (Westin, 2003). Content captured within Web browsers or on server logs may therefore include such sensitive items as socially inappropriate activities, confidential business items, and personal activities conducted on company time, as well as more neutral items such as situation-appropriate content (e.g., weather information). Visual privacy issues have been investigated with respect to traces of prior Web browsing activity visible within Web browsers during co-located collaboration (Hawkey, 2007; Hawkey & Inkpen, 2006b). Dispositional variables, such as age, computer experience, and inherent privacy concerns, combine with situational variables, such as device and location, to create contextual privacy concerns. Within each location, the social norms and Web usage policies, role of the person, and potential viewers of the display and users of the device impact both the Web browsing behaviors and privacy comfort levels in a given situation. The impacted Web browsing behaviors include both the Web sites visited, as well as convenience feature usage such as history settings and auto completes. Furthermore, most participants reported taking actions to further limit which traces are potentially visible if given advanced warning of collaboration.

Recently the sensitivity of search terms has been a topic in the mainstream news. In August 2006, AOL released the search terms used by 658,000 anonymous users over a three month period (McCullagh, 2006). These search terms revealed a great deal about the interests of AOL’s users, and their release was considered to be a privacy violation. Even though only a few of the users were able to be identified by combining information found within the search terms they used, AOL soon removed the data from public access. This data highlighted the breadth of search terms with respect to content sensitivity as well as how much the terms could reveal about the users in terms of their concerns and personal activities.

Key Terms in this Chapter

Privacy: “The claim of an individual to determine what information about himself or herself should be known to others.” (Westin, 2003).

Anonymized Data: Data that has been collected with identifying information, but has had subsequent removal of any links between the data and identifying information so that the researcher can no longer discern the specific owner of the data.

Web Browsing Environment: The context within which Web browsing occurs. For studies of Web usage this includes the Web browser and its associated tools (e.g., history, specialized toolbars), the task, and the motivation for conducting the browsing.

Anonymous Data: Data that is collected without any associated identifying information.

Contextual Privacy Concerns: Privacy concerns vary in any given instance according to the inherent privacy concerns of the user and the situational factors at play. These include the viewer of the information, level of control retained over the information, and the type of information. Furthermore, these factors can vary according to the device in use and the location.

Proxy Logging: Software that serves as an intermediary between the user’s web browser and the web site servers. Users generally have to log-in to the proxy and the proxy server can be used to augment retrieved web pages.

Inherent Privacy Concerns: An individual’s general privacy concerns; their disposition to privacy. Factors which may impact a person’s disposition to privacy include their age and computer experience.

Server-Side Logging: Software that records Web browsing behavior at the server. Data collection is generally limited to navigation information.

Web Browsing Behaviors: User behaviors on the Web including their browsing activities and Web browser interactions. Privacy concerns have been found to impact Web browsing behaviours.

Client-Side Logging: Software that records Web browsing behavior at the user’s computer. This is generally achieved either through a custom web browser or through browser plug-ins such as tool bars or browser helper objects.

Complete Chapter List

Search this Book:
Table of Contents
Bernard J. Jansen, Amanda Spink, Isak Taksa
Chapter 1
Bernard J. Jansen, Isak Taksa, Amanda Spink
This chapter outlines and discusses theoretical and methodological foundations for transaction log analysis. We first address the fundamentals of... Sample PDF
Research and Methodological Foundations of Transaction Log Analysis
Chapter 2
W. David Penniman
This historical review of the birth and evolution of transaction log analysis applied to information retrieval systems provides two perspectives.... Sample PDF
Historic Perspective of Log Analysis
Chapter 3
Lee Rainie, Bernard J. Jansen
Every research methodology for data collection has both strengths and limitations, and this is certainly true for transaction log analysis.... Sample PDF
Surveys as a Complementary Method for Web Log Analysis
Chapter 4
Sam Ladner
This chapter aims to improve the rigor and legitimacy of Web-traffic measurement as a social research method. I compare two dominant forms of... Sample PDF
Watching the Web: An Ontological and Epistemological Critique of Web-Traffic Measurement
Chapter 5
Kirstie Hawkey
This chapter examines two aspects of privacy concerns that must be considered when conducting studies that include the collection of Web logging... Sample PDF
Privacy Concerns for Web Logging Data
Chapter 6
Bernard J. Jansen
Exploiting the data stored in search logs of Web search engines, Intranets, and Websites can provide important insights into understanding the... Sample PDF
The Methodology of Search Log Analysis
Chapter 7
Anthony Ferrini, Jakki J. Mohr
As the Web’s popularity continues to grow and as new uses of the Web are developed, the importance of measuring the performance of a given Website... Sample PDF
Uses, Limitations, and Trends in Web Analytics
Chapter 8
Danielle Booth
This chapter is an overview of the process of Web analytics for Websites. It outlines how visitor information such as number of visitors and visit... Sample PDF
A Review of Methodologies for Analyzing Websites
Chapter 9
Gi Woong Yun
This chapter discusses validity of units of analysis of Web log data. First, Web log units are compared to the unit of analysis of television to... Sample PDF
The Unit of Analysis and the Validity of Web Log Data
Chapter 10
Kirstie Hawkey, Melanie Kellar
This chapter presents recommendations for reporting context in studies of Web usage including Web browsing behavior. These recommendations consist... Sample PDF
Recommendations for Reporting Web Usage Studies
Chapter 11
Seda Ozmutlu, Huseyin C. Ozmutlu, Amanda Spink
This chapter summarizes the progress of search engine user behavior analysis from search engine transaction log analysis to estimation of user... Sample PDF
From Analysis to Estimation of User Behavior
Chapter 12
Gheorghe Muresan
In this chapter, we describe and discuss a methodological framework that integrates analysis of interaction logs with the conceptual design of the... Sample PDF
An Integrated Approach to Interaction Design and Log Analysis
Chapter 13
Brian Detlor, Maureen Hupfer, Umar Ruhi
This chapter provides various tips for practitioners and researchers who wish to track end-user Web information seeking behavior. These tips are... Sample PDF
Tips for Tracking Web Information Seeking Behavior
Chapter 14
Sandro José Rigo
Adaptive Hypermedia is an effective approach to automatic personalization that overcomes the difficulties and deficiencies of traditional Web... Sample PDF
Identifying Users Stereotypes for Dynamic Web Pages Customization
Chapter 15
Brian K. Smith, Priya Sharma, Kyu Yon Lim, Goknur Kaplan Akilli, KyoungNa Kim, Toru Fujimoto
Computers and networking technologies have led to increases in the development and sustenance of online communities, and much research has focused... Sample PDF
Finding Meaning in Online, Very-Large Scale Conversations
Chapter 16
Isak Taksa, Sarah Zelikovitz, Amanda Spink
Search query classification is a necessary step for a number of information retrieval tasks. This chapter presents an approach to non-hierarchical... Sample PDF
Machine Learning Approach to Search Query Classification
Chapter 17
Seda Ozmutlu, Huseyin C. Ozmutlu, Amanda Spink
This chapter emphasizes topic analysis and identification of search engine user queries. Topic analysis and identification of queries is an... Sample PDF
Topic Analysis and Identification of Queries
Chapter 18
Elmer V. Bernstam, Jorge R. Herskovic, William R. Hersh
Clinicians, researchers and members of the general public are increasingly using information technology to cope with the explosion in biomedical... Sample PDF
Query Log Analysis in Biomedicine
Chapter 19
Michael Chau, Yan Lu, Xiao Fang, Christopher C. Yang
More non-English contents are now available on the World Wide Web and the number of non-English users on the Web is increasing. While it is... Sample PDF
Processing and Analysis of Search Query Logs in Chinese
Chapter 20
Udo Kruschwitz, Nick Webb, Richard Sutcliffe
The theme of this chapter is the improvement of Information Retrieval and Question Answering systems by the analysis of query logs. Two case studies... Sample PDF
Query Log Analysis for Adaptive Dialogue-Driven Search
Chapter 21
Mimi Zhang
In this chapter, we present the action-object pair approach as a conceptual framework for conducting transaction log analysis. We argue that there... Sample PDF
Using Action-Object Pairs as a Conceptual Framework for Transaction Log Analysis
Chapter 22
Paul DiPerna
This chapter proposes a new theoretical construct for evaluating Websites that facilitate online social networks. The suggested model considers... Sample PDF
Analysis and Evaluation of the Connector Website
Chapter 23
Marie-Francine Moens
This chapter introduces information extraction from blog texts. It argues that the classical techniques for information extraction that are commonly... Sample PDF
Information Extraction from Blogs
Chapter 24
Adriana Andrade Braga
This chapter explores the possibilities and limitations of nethnography, an ethnographic approach applied to the study of online interactions... Sample PDF
Nethnography: A Naturalistic Approach Towards Online Interaction
Chapter 25
Isak Taksa, Amanda Spink, Bernard J. Jansen
Web log analysis is an innovative and unique field constantly formed and changed by the convergence of various emerging Web technologies. Due to its... Sample PDF
Web Log Analysis: Diversity of Research Methodologies
About the Contributors