Disclosure Control of Confidential Data by Applying Pac Learning Theory

Disclosure Control of Confidential Data by Applying Pac Learning Theory

Ling He, Haldun Aytug, Gary J. Koehler
DOI: 10.4018/978-1-61350-471-0.ch018
(Individual Chapters)
No Current Special Offers


This paper examines privacy protection in a statistical database from the perspective of an intruder using learning theory to discover private information. With the rapid development of information technology, massive data collection is relatively easier and cheaper than ever before. The challenge is how to provide database users with reliable and useful data while protecting the privacy of the confidential information. This paper discusses how to prevent disclosing the identity of unique records in a statistical database. The authors’ research extends previous work and shows how much protection is necessary to prevent an adversary from discovering confidential data with high probability at small error.
Chapter Preview

Traditional Approaches For Disclosure Control Methods

A compromise of a database occurs when confidential information is disclosed exactly, partially or inferentially in such a way that the user can link the data to an entity. Inferential disclosure or statistical inference (Más, 2000) refer to the situation that an unauthorized user can infer the confidential data with a high probability by running sequential queries and the probability exceeds a predetermined threshold of disclosure. For example, assume a hospital database has a binary filed called HIV-Status. A user can issue several SUM (HIV-Status) queries against this database. Individually, these queries may not pose a threat, however, when combined the adversary might infer the HIV-Status of a patient (for a full example see Garfinkel, Gopal, & Goes, 2002). This is known as an inference problem, which falls within our research focus.

Adam and Wortmann (1989) classify SDC methods for SDBs into four categories: Conceptual, Query Restriction, Data Perturbation, and Output Perturbation. Perturbations are achieved by applying either an additive or multiplicative technique. An additive technique (Muralidhar, Parsa, & Sarathy, 1999) adds noise to the confidential data. Multiplicative data perturbation (Muralidhar, Batra, & Kirs, 1995) protects the sensitive information by multiplying the original data with a random variable, with mean 1 and a pre-specified variance.

Data shuffling, a perturbation technique, proposed and further studied by Muralidhar and Sarathy (2006) and Muralidhar, Sarathy, and Dandekar (2006) offers a high level of data utility while reducing the disclosure risk by shuffling data among observations. Data shuffling maintains all advantages of perturbation methods and provides a better performance than other data protection methods.

Complete Chapter List

Search this Book: