Article Preview
Top1. Introduction
With the rapid growth of the database technologies, an immense amount of personal data of individuals has been collected for the analysis purpose by the various organizations. Data mining techniques have been used to find out the useful information from the collected data. The collected data could be associated with the medical database, voter database, and census database. However, the collected data might contain sensitive personal data. Once mining such data, the individual privacy could be in danger and would disclose his/her personal sensitive data. Therefore, sensitive data needs to be protected before conducting the data mining. For this reason, the privacy preserving data mining becomes an important issue in recent years (Agrawal & Srikant, 2000; Lindell & Pinkas, 2003).
Generally, two main approaches have been discussed in the literature for preserving the privacy. First approach supports cryptographic techniques (Zhan, 2007; Upmanyu et al, 2010; Jagannathan et al, 2010) and the second support non-cryptographic techniques (Sweeney, 2002; Samarati,1998; Byun et al, 2007; LeFevre et al, 2006; Loukides & Shao, 2007; Chiu & Tsai, 2007; Lin & Wei, 2008; Kabir et al, 2011; Gionis & Tassa, 2009; Machanavajjhala et al, 2006; Li & Li, 2007; Wong et al, 2006; Goldberger & Tassa, 2010; LeFevre et al, 2005; Moon et al, 2001; Meyerson & Williams, 2004; Aggrawal et al, 2005; Iyenger, 2002; Nergiz & Clifton, 2006; Ghinita, 2009; Bayardo & Agrawal, 2005; Kiffer & Gehrke, 2006; Samarati, 2001; Gionis et al, 2008). However, our focus here is on non-cryptographic approaches, owing to the lesser computation cost of the same as compared to their cryptographic counterpart (Zhan, 2007; Upmanyu et al, 2010; Jagannathan et al, 2010).
One of the methods amongst the non-cryptographic approach is the k-anonymity model (Sweeney, 2002; Samarati, 1998; Samarati, 2001). The k-anonymity model protects sensitive data from identification using any combination of data generalization and/or suppression (Sweeney, 2002). The k-anonymity model partitions the records into several groups in such a way that each group contains at least k similar records. Such a group of similar records represents a cluster.