Dynamic Itemset Hiding Algorithm for Multiple Sensitive Support Thresholds

Dynamic Itemset Hiding Algorithm for Multiple Sensitive Support Thresholds

Ahmet Cumhur Öztürk, Belgin Ergenç
Copyright: © 2018 |Pages: 23
DOI: 10.4018/IJDWM.2018040103
(Individual Articles)
No Current Special Offers


This article describes how association rule mining is used for extracting relations between items in transactional databases and is beneficial for decision-making. However, association rule mining can pose a threat to the privacy of the knowledge when the data is shared without hiding the confidential association rules of the data owner. One of the ways hiding an association rule from the database is to conceal the itemsets (co-occurring items) from which the sensitive association rules are generated. These sensitive itemsets are sanitized by the itemset hiding processes. Most of the existing solutions consider single support thresholds and assume that the databases are static, which is not true in real life. In this article, the authors propose a novel itemset hiding algorithm designed for the dynamic database environment and consider multiple itemset support thresholds. Performance comparisons of the algorithm is done with two dynamic algorithms on six different databases. Findings show that their dynamic algorithm is more efficient in terms of execution time and information loss and guarantees to hide all sensitive itemsets.
Article Preview

1. Introduction

Data mining is the process of extracting knowledge from data with the help of statistics, artificial intelligence, machine learning and database systems. Association rule mining is one of the data mining tasks. It was first proposed by (Agrawal and Srikant, 1994) and is used for discovering correlated items transactional databases. Association rule mining process has mainly two steps; the first step is called frequent itemset (co-occurring items) generation and the second step is called rule generation where meaningful rules are generated from the discovered frequent itemsets. The second step is straightforward and similar in all proposed algorithms; as a result, association rule mining algorithms focus on the first step which is computationally expensive. For this reason, terms association rule mining and itemset mining are used interchangeably.

Lately many organizations use itemset mining tasks for short or long-term planning and strategical decision making. In modern business, organizations also share data with each other or with third parties in order to provide extraction of knowledge for mutual benefit. Similarly, itemset mining tasks are applied on this shared data however this may pose security threat for strategical and sensitive information of data owners.

The importance of this threat is well explained with a scenario given in (Clifton and Marks, 1996). Suppose BigMart supermarket chain is negotiating with DedTrees Paper Company for selling their products, and DedTrees offers to reduce their price if BigMart agrees to share sales database. After BigMart agrees to share the sales database, DedTrees applies itemset mining task on this database and finds out that people who purchase skim milk also purchase Green paper. Dedtrees Company then runs a coupon marketing campaign that gives 50 cents off for each purchase a Dedtrees product. The campaign cuts heavily sales of Green paper and as a result Green paper has to increase prices because of the low sales amount. In the next negotiation with DedTrees, they are unwilling to reduce their prices because they reach their goal. As a result, the BigMart suffers serious losses to competitors. This scenario shows that before the data is shared with other parties, the database owner should take precautions to protect its strategical and sensitive knowledge from being discovered by itemset mining task. Privacy preserving itemset mining is the problem of preserving the sensitive itemsets from being discovered in case of data sharing.

The most popular approach for sanitizing sensitive and frequent itemsets is to decrease their frequency (support) under predefined support threshold. Such a modification operation of converting the original database D into a sanitized database D’ is called frequent itemset hiding. A well designed frequent itemset hiding algorithm should hide all given sensitive itemsets while keeping the loss of non-sensitive itemsets, production of new artificial frequent itemsets and the distortion done on the database at minimum. Most proposed frequent itemset hiding approaches allow user to define a single support threshold for each sensitive itemset and assume that the databases are static (Amiri, 2007; Li et al., 2007; Weng et al., 2008; Dehkordi and Dehkordi, 2016; Verykios et al., 2004; Pontikakis et al., 2004; Hong et al., 2013; Cheng et al., 2016). Single support threshold barrier does not suit the nature of different itemsets; in transactional databases, frequency of some sensitive itemsets may be too high while some sensitive itemsets may be too low. Assigning unique single sensitive support threshold for each sensitive itemsets may result in decreasing the frequency of some itemsets more than required.

Complete Article List

Search this Journal:
Volume 20: 1 Issue (2024)
Volume 19: 6 Issues (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing