Interaction Data: Confidentiality and Disclosure

Interaction Data: Confidentiality and Disclosure

Oliver Duke-Williams (University of Leeds, United Kingdom)
DOI: 10.4018/978-1-61520-755-8.ch003
OnDemand PDF Download:
No Current Special Offers


As we saw in Chapter 1, interaction data sets have been derived from a number of sources, including censuses, other surveys and from a range of administrative sources. These typically have the characteristic that the data form large, sparsely populated matrices. Where the matrices do have non-zero values, those numbers are often small. This is highly significant when confidentiality is concerned – small numbers in aggregate data are generally seen as representing an increased risk of disclosure of data. This chapter looks at confidentiality issues with particular regard to interaction data. Different types of disclosure are considered, together with the reasons why interaction data are thought to pose particular disclosure problems. Methods of disclosure control are outlined, and then two particular methods are studied: those used in the 1991 and the 2001 UK Censuses. The methods used and the extent of their effects are described, and suggestions for how best to use the affected data sets are given.
Chapter Preview


The first chapter of this book introduced a variety of interaction data sets that exist in the UK, generated from censuses and a variety of administrative data sources. These vary in nature in many ways – some are based on population samples, whilst others are based on a theoretically complete census; in the case of migration data, some data sources capture migrants over a transitional period, whilst others capture information about specific migration events. They are also made available in different ways and with vary degrees of freedom of access. A common theme, however, is that as data about individuals, safeguards are taken to ensure confidentiality in the data released for research.

This chapter develops ideas about confidentiality of interaction data and measures that have been taken to prevent disclosure of information about individuals. The data sets that are discussed in this chapter share the general characteristic that they are built up from records that refer to individuals. This introduction starts by describing some of the key terms that shape the chapter. What do we mean by ‘confidentiality’ and ‘disclosure’? More specifically, what do we mean by these terms in the context of interaction data comprised of individual records from censuses, surveys and administrative registers? Given ideas of confidentiality and disclosure, how do statistical agencies go about ensuring that confidentiality is maintained? In defining these terms, it will be seen that in large part this is done through modifications to the data before they are released. This chapter goes on to consider some specific ways in which interaction data have been modified in order to make them ‘safe’ to release. How significantly have the data been affected? What can users do to accommodate these modifications in their research? The chapter focuses on procedures used in the 1991 and 2001 Censuses.

What is confidentiality? Confidentiality is a term that refers to preventing disclosure of information to unauthorised parties. It applies to many different sorts of data. In this chapter, confidentiality will be discussed with respect to personal data, although it is also often considered with relation to business data as well. For personal data, confidentiality is a concept closely related to privacy. Public and private agencies generally have legal and ethical obligations to ensure that they maintain confidentiality of the data that they collect. It is usually argued that for statistical agencies, being seen to ensure confidentiality is an important element of building public trust, and that increased levels of trust lead to improved response rates. However, Singer et al. (1993) studying the 1990 US Census, argued that trust in confidentiality had only a limited affect on response rates and that this relationship varied for black and white respondents. The effect of trust may vary depending on the nature of the survey taken: in the case of a sample study, the individual has the ability to opt out, whereas in the case of a census the individual faces legal coercion to complete a census form.

Confidentiality for public data has two main aspects: first, confidentiality must be maintained over raw data. Thus, statistical agencies must ensure that data are stored and processed (either in-house or via sub-contractors) in a secure manner, without inadvertent or deliberate disclosure. Confidentiality of raw data is usually ensured by appropriate data security arrangements, and by the threat of legal penalties against employees or sub-contractors should they disclose information. Most recent media stories about problems of data protection such as the child benefit data loss (Poynter, 2008) focus on actual or potential confidentiality breaches through failures of internal data security.

The second aspect of confidentiality arises when the data are released, and it is on this area that this chapter focuses. A combination of tactics are used to ensure confidentiality in released data. Some data sets require individual or corporate users to sign license agreements; these typically contain legal undertakings not to disclose information relating to individuals. However, legal protections alone are not usually considered sufficient to ensure that confidentiality will be maintained, and thus further measures are also taken. These further measures take the form of statistical disclosure control methods which modify the data that are to be publicly released, in order to reduce the risk of disclosure. There are a number of different approaches to disclosure control, and there are multiple variants of general approaches. Some of these are described below in the section on imputation and disclosure control.

Complete Chapter List

Search this Book: