The chapter discusses the traditional expectations about privacy protection and argues that current models for the governance of data do not adequately fulfil these expectations. The traditional models of privacy protection are based on the assumption that strict anomymisation of released statistical data is the way to protect privacy and ensure public trust in the research enterprise. It will be argued that the main barriers to privacy preservation and the perpetuation of public trust are due to the capabilities of information technology on the one hand and the availability of numerous data sources on the other. Furthermore, both types of resource enable certain types of organisation to ‘read’ and categorise other people. The realities of data-processing technologies challenge the dichotomy, present in the legal framework for data-protection, between ‘personal’ and research data. This dichotomy, moreover, is not useful in the protection of informational privacy. The chapter will refer to several examples of uses of data in what are in effect ‘socio-technical systems’, which arguably challenge accepted methods of privacy protection in this area.
So act that you use humanity, whether in your own person or in the person of any other, always at the same time as end, never merely as a means. (Ak 4:429)
—Kant, I, 1997 Translation,
Groundwork of the Metaphysics of Morals
Cambridge University PressTop
Research organisations maintain that strict anonymisation of disseminated results is the bedrock of privacy protection and the best way to ensure public trust. In this chapter it will be argued that the realities of data processing within certain ‘socio-technical systems’ mean that the process of anonymisation, which is applied to statistical data, does not alone satisfactorily achieve these aims. The chapter considers traditional approaches taken to the protection of the privacy of data subjects by research organisations. It will be argued that these must be rethought in the light of the availability and use of sophisticated data-processing technologies and multiple data sources. Research organisations rely on a traditional model of anonymisation and informed consent to ensure ethical treatment of data and this approach is still the standard (Lowrance, 2002). This model ostensibly allows data-subjects to control the circumstances in which they provide data and ensure that direct consequences arising from the provision of data will be limited. However, there are many challenges to the efficacy of this model in protecting the values it intends to protect, including privacy and related benefits (Vedder, 2001). The chapter will discuss ways of understanding privacy and consider how certain types of reuse of data, such as profiling, are outside the original organisational context, challenge accepted norms of data classification and as a result undermine the ability of the current data protection framework to protect privacy. Nissenbaum’s (1998) concept of ‘contextual integrity’ will be used to explore likely expectations with regard to privacy. The chapter will refer to the use of outputs of National Statistical Institutes (NSIs) in ‘socio-technical systems’ such as that constituted by the information super-bureau, Experion. NSIs provide a good example of a visible public sector organisation, which compiles and disseminates statistical or anonymised data.
Key Terms in this Chapter
Anomymisation: This process involves removing identifiers from the data. This can be done in number of different ways often in combination, these include: Removing variables (The first obvious application of this method is the removal of direct identifiers from the data file.); Global recoding (global recoding consisting in aggregating the values observed in a variable into pre-defined classes, for example, recoding age into five-year age groups); local suppression (which consists in replacing the observed value of one or more variables in a certain record). Anonymization is one solution to minimize the risk of identity disclosure when distributing microdata.
Profiling: This relates to the recording and classification of behaviors. This occurs through aggregating information. This often collating information often derived from a number of resources to build profiles on individuals in order to sell products and to sell model and predict behavior. These profiles may be used by marketers for target advertising. Companies may link profiles to individual’s identities.
Categorize: This means assigning an entity to a category. It involves the classification, labeling of entities so that they can be assigned to a class or a category. This can be done by existing categories (for example age) or specially designed ones, which can, for example be used to segment populations on the basis of a number of different characteristics that they have.
Context: Context involves the organization, or set of researchers or professionals who collected data and explicit or implicit agreements that were established with data-subjects. Context can mean the physical situation but also involves a number of understandings and expectations about what one can expect from data given in a particular situation under a specific set of conditions. An example of a breach of context would be if data provided to one’s doctor for medical purposes was used by a credit company to assess an individual’s financial viability.
Socio-Technical Systems: These are associations of information technology organizations and people. The term embodies recognition that there is interaction between people and technologies. The term also refers to the interaction between societal structures and values and human behaviors.
Personal Data: This is simply identifiable data. In Article 2(b) of EU Directive 95/46/EC, ‘Personal data’ is defined as data that can directly or indirectly be linked to an individual, through an identification number, for example, or to a particular characteristic that would indicate a person’s identity.
Statistical: Statistical data is legally a separate entity from the ‘personal’ data covered by data protection legislation. Statistical data is said to answer questions about number, amount and percentages rather than about individuals.
Privacy: Liberal political theory recognizes this capacity in the rational individual and tends to advocate the protection of the individual’s ability to use this capacity. From this is derived the notion that privacy is one way in which the individual could be protected from becoming subject to manipulation by others. An important point is that the concept of privacy in this chapter is almost always informational privacy or privacy as it relates to information disclosed by an individual. How the concept of informational privacy is derived will be crucial to understanding the way in which other central concepts are used.
Data-Dichotomy: This relates to the distinction between ‘personal’ and statistical data, or the split between identified and non-identified data. The dichotomy between statistical and personal data is constantly reiterated and is relevant to how individual privacy in relation to data is protected.