Disclosure Control of Confidential Data by Applying Pac Learning Theory

Disclosure Control of Confidential Data by Applying Pac Learning Theory

Ling He (Saginaw Valley State University, USA), Haldun Aytug (University of Florida, USA) and Gary J. Koehler (University of Florida, USA)
Copyright: © 2010 |Pages: 13
DOI: 10.4018/jdm.2010100106
OnDemand PDF Download:
No Current Special Offers


This paper examines privacy protection in a statistical database from the perspective of an intruder using learning theory to discover private information. With the rapid development of information technology, massive data collection is relatively easier and cheaper than ever before. The challenge is how to provide database users with reliable and useful data while protecting the privacy of the confidential information. This paper discusses how to prevent disclosing the identity of unique records in a statistical database. The authors’ research extends previous work and shows how much protection is necessary to prevent an adversary from discovering confidential data with high probability at small error.
Article Preview

Traditional Approaches For Disclosure Control Methods

A compromise of a database occurs when confidential information is disclosed exactly, partially or inferentially in such a way that the user can link the data to an entity. Inferential disclosure or statistical inference (Más, 2000) refer to the situation that an unauthorized user can infer the confidential data with a high probability by running sequential queries and the probability exceeds a predetermined threshold of disclosure. For example, assume a hospital database has a binary filed called HIV-Status. A user can issue several SUM (HIV-Status) queries against this database. Individually, these queries may not pose a threat, however, when combined the adversary might infer the HIV-Status of a patient (for a full example see Garfinkel, Gopal, & Goes, 2002). This is known as an inference problem, which falls within our research focus.

Adam and Wortmann (1989) classify SDC methods for SDBs into four categories: Conceptual, Query Restriction, Data Perturbation, and Output Perturbation. Perturbations are achieved by applying either an additive or multiplicative technique. An additive technique (Muralidhar, Parsa, & Sarathy, 1999) adds noise to the confidential data. Multiplicative data perturbation (Muralidhar, Batra, & Kirs, 1995) protects the sensitive information by multiplying the original data with a random variable, with mean 1 and a pre-specified variance.

Data shuffling, a perturbation technique, proposed and further studied by Muralidhar and Sarathy (2006) and Muralidhar, Sarathy, and Dandekar (2006) offers a high level of data utility while reducing the disclosure risk by shuffling data among observations. Data shuffling maintains all advantages of perturbation methods and provides a better performance than other data protection methods.

Complete Article List

Search this Journal:
Open Access Articles
Volume 32: 4 Issues (2021): Forthcoming, Available for Pre-Order
Volume 31: 4 Issues (2020)
Volume 30: 4 Issues (2019)
Volume 29: 4 Issues (2018)
Volume 28: 4 Issues (2017)
Volume 27: 4 Issues (2016)
Volume 26: 4 Issues (2015)
Volume 25: 4 Issues (2014)
Volume 24: 4 Issues (2013)
Volume 23: 4 Issues (2012)
Volume 22: 4 Issues (2011)
Volume 21: 4 Issues (2010)
Volume 20: 4 Issues (2009)
Volume 19: 4 Issues (2008)
Volume 18: 4 Issues (2007)
Volume 17: 4 Issues (2006)
Volume 16: 4 Issues (2005)
Volume 15: 4 Issues (2004)
Volume 14: 4 Issues (2003)
Volume 13: 4 Issues (2002)
Volume 12: 4 Issues (2001)
Volume 11: 4 Issues (2000)
Volume 10: 4 Issues (1999)
Volume 9: 4 Issues (1998)
Volume 8: 4 Issues (1997)
Volume 7: 4 Issues (1996)
Volume 6: 4 Issues (1995)
Volume 5: 4 Issues (1994)
Volume 4: 4 Issues (1993)
Volume 3: 4 Issues (1992)
Volume 2: 4 Issues (1991)
Volume 1: 2 Issues (1990)
View Complete Journal Contents Listing