A Performance Study of Secure Data Mining on the Cell Processor

A Performance Study of Secure Data Mining on the Cell Processor

Hong Wang (Tohoku University, Japan), Hiroyuki Takizawa (Tohoku University, Japan) and Hiroaki Kobayashi (Tohoku University, Japan)
Copyright: © 2012 |Pages: 13
DOI: 10.4018/978-1-61350-323-2.ch416


This article examines the potential of the Cell processor as a platform for secure data mining on the future volunteer computing systems. Volunteer computing platforms have the potential to provide massive computing power. However, privacy and security concerns prevent using volunteer computing for data mining of sensitive data. The Cell processor comes with hardware security features. The secure volunteer data mining can be achieved by using those hardware security features. In this article, we present a general security scheme for the volunteer computing, and a secure parallelized K-Means clustering algorithm for the Cell processor. We also evaluate the performance of the algorithm on the Cell secure system simulator. Evaluation results indicate that the proposed secure data clustering outperforms a non-secure clustering algorithm on the general purpose CPU, but incurs a huge performance overhead introduced by the decryption process of the Cell security features. Possible optimization for the secure K-Means clustering is discussed.
Chapter Preview

Privacy preserving data mining (Verykios, Bertino, et al., 2004) is a novel research direction in data mining. The main object in privacy preserving data mining is to develop algorithms for modifying the original data in some way, so that the private data and private knowledge of any participants remain private even after the mining process. The typical modification methods include: perturbation, blocking, aggregation/merging, swapping, and sampling. A number of algorithms have been designed for different data mining techniques, such as classification, association rule discovery, and clustering. These algorithms can be classified into the following three types:

  • Heuristic-based: has side effects due to selective data modification or sanitization.

  • Cryptography-based: conducts data mining on private data from multiple parties. None of these parties is willing to disclose its own data and found knowledge. It is referred to as the Secure Multiparty Computation (SMC) problem

  • Reconstruction-based: perturbs the data and reconstructs the distribution at an aggregate level.

Complete Chapter List

Search this Book: