Article Preview
TopTraditional Approaches For Disclosure Control Methods
A compromise of a database occurs when confidential information is disclosed exactly, partially or inferentially in such a way that the user can link the data to an entity. Inferential disclosure or statistical inference (Más, 2000) refer to the situation that an unauthorized user can infer the confidential data with a high probability by running sequential queries and the probability exceeds a predetermined threshold of disclosure. For example, assume a hospital database has a binary filed called HIV-Status. A user can issue several SUM (HIV-Status) queries against this database. Individually, these queries may not pose a threat, however, when combined the adversary might infer the HIV-Status of a patient (for a full example see Garfinkel, Gopal, & Goes, 2002). This is known as an inference problem, which falls within our research focus.
Adam and Wortmann (1989) classify SDC methods for SDBs into four categories: Conceptual, Query Restriction, Data Perturbation, and Output Perturbation. Perturbations are achieved by applying either an additive or multiplicative technique. An additive technique (Muralidhar, Parsa, & Sarathy, 1999) adds noise to the confidential data. Multiplicative data perturbation (Muralidhar, Batra, & Kirs, 1995) protects the sensitive information by multiplying the original data with a random variable, with mean 1 and a pre-specified variance.
Data shuffling, a perturbation technique, proposed and further studied by Muralidhar and Sarathy (2006) and Muralidhar, Sarathy, and Dandekar (2006) offers a high level of data utility while reducing the disclosure risk by shuffling data among observations. Data shuffling maintains all advantages of perturbation methods and provides a better performance than other data protection methods.