Article Preview
Top1. Introduction
Data mining is the process of extracting knowledge from data with the help of statistics, artificial intelligence, machine learning and database systems. Association rule mining is one of the data mining tasks. It was first proposed by (Agrawal and Srikant, 1994) and is used for discovering correlated items transactional databases. Association rule mining process has mainly two steps; the first step is called frequent itemset (co-occurring items) generation and the second step is called rule generation where meaningful rules are generated from the discovered frequent itemsets. The second step is straightforward and similar in all proposed algorithms; as a result, association rule mining algorithms focus on the first step which is computationally expensive. For this reason, terms association rule mining and itemset mining are used interchangeably.
Lately many organizations use itemset mining tasks for short or long-term planning and strategical decision making. In modern business, organizations also share data with each other or with third parties in order to provide extraction of knowledge for mutual benefit. Similarly, itemset mining tasks are applied on this shared data however this may pose security threat for strategical and sensitive information of data owners.
The importance of this threat is well explained with a scenario given in (Clifton and Marks, 1996). Suppose BigMart supermarket chain is negotiating with DedTrees Paper Company for selling their products, and DedTrees offers to reduce their price if BigMart agrees to share sales database. After BigMart agrees to share the sales database, DedTrees applies itemset mining task on this database and finds out that people who purchase skim milk also purchase Green paper. Dedtrees Company then runs a coupon marketing campaign that gives 50 cents off for each purchase a Dedtrees product. The campaign cuts heavily sales of Green paper and as a result Green paper has to increase prices because of the low sales amount. In the next negotiation with DedTrees, they are unwilling to reduce their prices because they reach their goal. As a result, the BigMart suffers serious losses to competitors. This scenario shows that before the data is shared with other parties, the database owner should take precautions to protect its strategical and sensitive knowledge from being discovered by itemset mining task. Privacy preserving itemset mining is the problem of preserving the sensitive itemsets from being discovered in case of data sharing.
The most popular approach for sanitizing sensitive and frequent itemsets is to decrease their frequency (support) under predefined support threshold. Such a modification operation of converting the original database D into a sanitized database D’ is called frequent itemset hiding. A well designed frequent itemset hiding algorithm should hide all given sensitive itemsets while keeping the loss of non-sensitive itemsets, production of new artificial frequent itemsets and the distortion done on the database at minimum. Most proposed frequent itemset hiding approaches allow user to define a single support threshold for each sensitive itemset and assume that the databases are static (Amiri, 2007; Li et al., 2007; Weng et al., 2008; Dehkordi and Dehkordi, 2016; Verykios et al., 2004; Pontikakis et al., 2004; Hong et al., 2013; Cheng et al., 2016). Single support threshold barrier does not suit the nature of different itemsets; in transactional databases, frequency of some sensitive itemsets may be too high while some sensitive itemsets may be too low. Assigning unique single sensitive support threshold for each sensitive itemsets may result in decreasing the frequency of some itemsets more than required.