Collusion-Free Privacy Preserving Data Mining

Collusion-Free Privacy Preserving Data Mining

T. Purusothaman (Government College of Technology, India), M. Rajalakshmi (Coimbatore Institute of Technology, India) and S. Pratheeba (Indian Institute of Science, India)
DOI: 10.4018/978-1-4666-0158-1.ch015
OnDemand PDF Download:
List Price: $37.50


Distributed association rule mining is an integral part of data mining that extracts useful information hidden in distributed data sources. As local frequent itemsets are globalized from data sources, sensitive information about individual data sources needs high protection. Different privacy preserving data mining approaches for distributed environment have been proposed but in the existing approaches, collusion among the participating sites reveal sensitive information about the other sites. In this paper, the authors propose a collusion-free algorithm for mining global frequent itemsets in a distributed environment with minimal communication among sites. This algorithm uses the techniques of splitting and sanitizing the itemsets and communicates to random sites in two different phases, thus making it difficult for the colluders to retrieve sensitive information. Results show that the consequence of collusion is reduced to a greater extent without affecting mining performance and confirms optimal communication among sites.
Chapter Preview


Major technological developments and innovations in the field of information technology have made it easy for organizations to store a huge amount of data within its affordable limit. Data mining techniques come in handy to extract useful information for strategic decision making from voluminous data which is either centralized or distributed (Agrawal & Srikant, 1994; Han & Kamber, 2001).

The term data mining refers to extracting or mining knowledge from a massive amount of data. Data mining functionalities like association rule mining, cluster analysis, classification, prediction etc. specify the different kinds of patterns mined. Association Rule Mining (ARM) finds interesting association or correlation among a large set of data items. Finding association rules among huge amount of business transactions can help in making many business decisions such as catalog design, cross marketing, etc. A best example of ARM is market basket analysis. This is the process of analyzing the customer buying habits from the association between the different items which is available in the shopping baskets. This analysis can help retailers to develop marketing strategies. ARM involves two stages

  • i)

    Finding frequent itemsets

  • ii)

    Generating strong association rules

Association Rule Mining: Basic Concepts

Let I = {i1,i2…im} be a set of m distinct items. Let D denote a database of transactions where each transaction T is a set of items such that T ⊆ I. Each transaction has a unique identifier, called TID. A set of item is referred to as an itemset. An itemset that contains k items is a k-itemset. Support of an itemset is defined as the ratio of the number of occurrences of the itemset in the data source to the total number of transactions in the data source. Support shows the frequency of occurrence of an itemset. The itemset X is said to have a support s if s% of transactions contain X. The support of an association rule X→Y is given bySupport = (Number of transactions containing X U Y) / (Total number of Transactions)where X is the antecedent and Y is the consequent

An itemset is said to be frequent when the number of occurrences of that particular itemset in the database is larger than a user-specified minimum support. Confidence shows the strength of the relation. The confidence of an association rule is given by,

Confidence = (Number of transactions containing X U Y) / (Total number of Transactions containing X)

An association rule is said to be strong when its confidence is larger than a user-specified minimum confidence. Association rules with support and confidence above the minimum support and minimum confidence alone are mined. Many algorithms have been proposed for frequent itemsets generation. They are Apriori, Pincer search, Frequent pattern tree, etc. (Agrawal & Srikant, 1994; Lin & Kedem, 2002; Han, Pei, Yin & Mao, 2004).

Complete Chapter List

Search this Book: