ADT: Anonymization of Diverse Transactional Data

ADT: Anonymization of Diverse Transactional Data

Vartika Puri, Parmeet Kaur, Shelly Sachdeva
Copyright: © 2021 |Pages: 23
DOI: 10.4018/IJISP.2021070106
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Data anonymization is commonly utilized for the protection of an individual's identity when his personal or sensitive data is published. A well-known anonymization model to define the privacy of transactional data is the km-anonymity model. This model ensures that an adversary who knows up to m items of an individual cannot determine which record in the dataset corresponds to the individual with a probability greater than 1/k. However, the existing techniques generally rely on the presence of similarity between items in the dataset tuples to achieve km-anonymization and are not suitable when transactional data contains tuples without many common values. The authors refer to this type of transactional data as diverse transactional data and propose an algorithm, anonymization of diverse transactional data (ADT). ADT is based on slicing and generalization to achieve km-anonymity for diverse transactional data. ADT has been experimentally evaluated on two datasets, and it has been found that ADT yields higher privacy protection and causes a lower loss in data utility as compared to existing methods.
Article Preview
Top

1. Introduction

A large number of modern applications and systems involve transaction processing. These transactions refer to events such as commercial transactions, banking, entry or updates of health records, etc. Each such event generates data, known as transactional data, which may be recorded for the generation of useful information in the future. Few examples of such data’s utility include the generation of product recommendations from a user’s past purchase history, inventory management given the sales records, fraud detection from users’ financial transactions and many more. However, publishing or publicly sharing any individual’s data may lead to serious privacy implications (Barbaro & Zeller, 2006; Narayanan & Shmatikov, 2008). This is especially true when this data is sensitive, for example, the data contains a user’s financial or health records. Further, data privacy is an important facet of data security and needs the utmost attention. This has led to a vast body of research studies in the domain of privacy-preserving publishing of data (Terrovitis et al., 2008; He & Naughton, 2009; Zhang et al., 2012; Kohlmayer et al., 2012). The methods to provide privacy-preserving data publishing include data encryption and data anonymization. Data anonymization is a popular way for privacy-preserving data publishing. Anonymization removes the fact that the particular records belong to the particular individual. The approach of anonymization is sufficient to prevent privacy attacks on published data. Therefore, the paper proposes the method based on anonymization to provide privacy-preserving publication of transactional data.

Some developed techniques, such as OLA (Zhang et al., 2012), Flash (Kohlmayer et al., 2012) are available to anonymize structured or relational data. It has been observed (Puri et al., 2019) that anonymization techniques for relational data do not apply to transactional data due to the lack of structure and sparseness in the latter. Hence, different models are required to define privacy- preserving publication of transactional data. A few models such as complete k-anonymity (He & Naughton, 2009), km-anonymity (Terrovitis et al., 2008), etc. have been developed in the past to define the privacy of transactional data. Complete k-anonymity assumes that every combination of attributes may be sensitive and should occur at least k times. Anonymization of data to achieve complete k-anonymity requires multiple additions or deletions of items from the dataset and thus, results in a high amount of information loss. Information loss is said to occur if the anonymized data is no longer useful for statistical analysis and mining purposes or does not provide similar information as original data. In comparison to complete k-anonymity, the km-anonymity model assumes every combination of attributes cannot be sensitive, therefore, it ensures every m-combination of items should occur at least k times. Since the km-anonymity model limits the anonymization to upto m items, hence, information loss is low compared to complete k-anonymity, and therefore, it is commonly used for transactional data to protect from identity disclosure attack (Terrovitis et al., 2008).

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing