High Performance Datafly based Anonymity Algorithm and Its L-Diversity

High Performance Datafly based Anonymity Algorithm and Its L-Diversity

Zhi-ting Yu (School of Computer Engineering and Science, Shanghai University, Shanghai, China), Quan Qian (School of Computer Engineering and Science, Shanghai University, Shanghai, China), Chun-Yuan Lin (Department of Computer Science and Information Engineering, Chang Gung University, Taoyuan, Taiwan) and Che-Lun Hung (Department of Computer Science and Communication Engineering, Providence University, Taichung, Taiwan)
Copyright: © 2015 |Pages: 16
DOI: 10.4018/IJGHPC.2015070106
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Data anonymity, as an effective privacy protection method, has been widely used in real applications. High performance data anonymity algorithm is especially attractive for those massive data applications. In this paper, the authors propose a novel and efficient Datafly based data anonymity (Divide-Datafly) algorithm and the experimental results show that the proposed algorithm is not only more efficient than Datafly and Incognito, but also information loss less than KACA. Moreover, in order to improve the security of anonymous data, L-Divide-Datafly is presented that it combines Divide-Datafly and efficient distance based clustering. Experimental results show that L-Divide-Datafly achieves great performance both in execution time and Information loss.
Article Preview

Data encryption, differential privacy, k-anonymity and many other technologies are proposed to protect the data privacy for users. The idea of k-anonymity is proposed by Samariti and L.Sweeney (Samariti and L.Sweeney, 1998). The key idea of kanonymity is to make individuals indistinguishable in a released table. A tuple representing an individual within the identifiable attributes has to be identical in at least (k-1) other tuples. This method has been widely used because of its simplicity.

The K-anonymity related algorithms could be divided into three types: global recoding, multidimensional recoding and local recoding. Global recoding algorithms, such as Datafly (Lefevre et al., 2005), Incognito (Sweeney, 2002b), TopDown (Fung et al., 2005) and BottomUp (Wang et al., 2004), require that all attributes of the tuples in dataset have the same generalization form. Although these algorithms have low computation complexity, they may cause over generalization. Multidimensional recoding, such as Mondrian, maps a set of values to another set of values, some of which are more general than the corresponding premapping values. But this model does not consider attribute hierarchical structures. Local recoding algorithms allow values of an attribute in different generalization domain. The information loss of these local recording anonymity algorithms is low, but the execution time of these algorithms is longer than that of global algorithm. Also, the model does not consider attribute hierarchical too. The typical local recoding algorithms are the KACA (Li et al., 2006), MDAV (Torra, 2004) and its L-diversity model (Jianmin et al., 2008). The optimal k-anonymity algorithm is considered as a NP-hard problem. Existing researches use heuristic strategies to gain an approximate optimal algorithm.

It is difficult to protect privacy just with k-anonymity model. Also, there are some attacks which k-anonymity is unable to resist, such as homogeneity attack, similarity attack and probability attack. Many algorithms are proposed to resist these attacks, such as p-sensitive k-anonymity (Truta & Vinay, 2006), (alpha,k)-anonymity (Wong et al., 2006), L-Diversity (Machanavajjhala et al., 2007),(a,d)-Diversity (Wang & Shi, 2009), t-closeness (Li et al., 2007), (ω; γ,k)-anonymity (Huang et al., 2014) and (l,t)-closeness anonymization (Yang et al., 2015).

In this paper, we analyzed the defect of global recoding and proposed a new algorithm Divide-Datafly. Through experiments, we compared the proposed algorithm with Datafly, Incognito and KACA. The experimental results on three different datasets show that, Divide-Datafly algorithm is suitable for dataset with numerical attribute. It improves the speed of anonymization and reduces the information loss. We also put forward an L-diversity model of the proposed algorithm based on clustering method and give experiments to analyze the execution time and information loss of it.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing