Article Preview
TopIntroduction
The advancements in the usage of data due to modernization and the access to an unprecedented amount of data have led to a lot of changes in the way organizations work and build their strategies. This huge amount of data is about more than just size, as it gives the organizations an opportunity to find insights in new and emerging types of data and content to make their businesses more agile. Analysis of such a huge amount of data also answers the questions that were previously considered beyond reach (Fan & Bifet, 2013). A number of organizations undertake collaborative data mining as well to extract new levels of business value (French & Magill, 2011) from the cumulative Big Data.
However, collaborative data mining requires preservation of individual privacy. These collaborative distributed systems that are used by competing parties need to provide them with privacy and security, along with the reliability and timeliness that they promise. In practice, it is necessary that these distributed systems are efficient and guarantee privacy to the participating parties. To handle this issue of preservation of privacy while undertaking collaborative data mining to realize the setup of coopetation (Pedersen, Saygin & Savas, 2007); a number of privacy preserving approaches have been proposed in Distributed Data Mining (DDM). The solution to have an efficient privacy preserving technique to realize coopetation (i.e. a blend of competition and cooperation) is an important research interest today (Coenen, 2011; Xu & Yi, 2011).
The existing privacy preserving algorithms (Kargupta, Das & Liu, 2007; Sekhavat & Fathian, 2010; Kantarcioglu & Clifton, 2004; Kantarcioglu, 2008; Vaidya, 2008; Samet & Miri, 2009; Vaidya & Clifton, 2005; Ge et al, 2010; Verykios & Gkoulalas-Divanis, 2008) do not deal with patterns or data that is temporal. In (Nanavati & Jinwala, 2012a), we introduced a construction to find total or partial global cycles among cyclic association rules while preserving the privacy. In Nanavati et al (2012a), the authors privately find the generic cycles after deciphering the locally frequent and cyclic association rules, with respect to other organizations.
In (Nanavati & Jinwala, 2012a), the proposed techniques for multi-party scenarios are based on homomorphic encryption based Efficient Private Matching (EPM) (Freedman, Nissim & Pinkas, 2004; Cristofaro & Tsudik, 2012) and Shamir’s secret sharing technique(Shamir, 1979) to find global cycles privately. We here propose a novel model for SMC with Shamir’s additive secret sharing using a Semi Trusted and a Fully Trusted/Trusted third party. We also do a detailed comparative analysis of these methods, based on public key based homomorphic encryption and secret sharing, with and without a third party. In this paper we aim to:
- •
Substantiate the theoretical proposal in (Nanavati & Jinwala, 2012a; Nanavati & Jinwala, 2012b) with an empirical evaluation and analysis of the models therein.
- •
Propose a novel construction for the non-collusive additive Secret Sharing scheme (Ge et al, 2010) using a Semi Trusted and a Fully Trusted Third party model that is more efficient than the model without third party with increase in the number of parties and the amount of data. These new techniques are much more efficient in the computation and communication cost than the two schemes proposed in (Nanavati & Jinwala, 2012a; Nanavati & Jinwala, 2012b).
- •
Give a comparative theoretical and empirical analysis of four different techniques for PPDARM in a temporal setup, using synthetic and real temporal datasets for different number of parties and different number of items.