The enormous expansion of data collection and storage facilities has created an unprecedented increase in the need for data analysis and processing power. Data mining has long been the catalyst for automated and sophisticated data analysis and interrogation. Recent advances in data mining and knowledge discovery have generated controversial impact in both scientific and technological arenas. On the one hand, data mining is capable of analyzing vast amounts of information within a minimum amount of time, an analysis that has exceeded the expectations of even the most imaginative scientists of the last decade. On the other hand, the excessive processing power of intelligent algorithms which is brought with this new research area puts at risk sensitive and confidential information that resides in large and distributed data stores. Privacy and security risks arising from the use of data mining techniques have been first investigated in an early paper by O’ Leary (1991). Clifton & Marks (1996) were the first to propose possible remedies to the protection of sensitive data and sensitive knowledge from the use of data mining. In particular, they suggested a variety of ways like the use of controlled access to the data, fuzzification of the data, elimination of unnecessary groupings in the data, data augmentation, as well as data auditing. A subsequent paper by Clifton (2000) made concrete early results in the area by demonstrating an interesting approach for privacy protection that relies on sampling. A main result of Clifton’s paper was to show how to determine the right sample size of the public data (data to be disclosed to the public where sensitive information has been trimmed off), by estimating at the same time the error that is introduced from the sampling to the significance of the rules. Agrawal and Srikant (2000) were the first to establish a new research area, the privacy preserving data mining, which had as its goal to consider privacy and confidentiality issues originating in the mining of the data. The authors proposed an approach known as data perturbation that relies on disclosing a modified database with noisy data instead of the original database. The modified database could produce very similar patterns with those of the original database.
One of the main problems which have been investigated within the context of privacy preserving data mining is the so-called association rule hiding. Association rule hiding builds on the data mining area of association rule mining and studies the problem of hiding sensitive association rules from the data. The problem can be formulated as follows.