Distributed association rule mining technique for a vertical partitioned data set across several sites. Let I = {i1, i2, .in} be a set of items and T = {T1, T2… Tn} be a set of transactions where each T? I, i. A transaction Ti contains an item set X?I only if I, X ?T. An
association rule associated is of the form X ?Y(X ?Y ? 0) with support S and confidence C if S% of the transactions in T contains X?Y and C% of transactions that contain X also contain Y. In a horizontally partitioned Data base, the transactions are
distributed among n sites. Support (X ?Y) = probe (X?Y) /Total number of transaction the global support count of an item set is the union or product of all local support counts. Support g (X) = Support1(x) ?Support2(x) ?…?Support n(x). Confidence (X ?Y) = Support (X ?Y) / Support(X). The global confidence of a
rule can be expressed in terms of the global support. Confidence g (X ?Y) = Support g (X ?Y) / Support g(X). The aim of the
distributed association rule mining is to discover all
rules with global support and global confidence greater than the user specified minimum support and confidence. The subsequent steps, utilizing the secure sum and secure set union methods described earlier are used. The basis of the algorithm is the Apriori algorithm, which use the (k-1) sized frequent item sets to generate the k sized frequent item sets.
Learn more in:
Secure Data Analysis in Clusters (Iris Database)