Privacy Preserving Data Mining: How Far Can We Go?

Privacy Preserving Data Mining: How Far Can We Go?

Aris Gkoulalas-Divanis (Vanderbilt University, USA) and Vassilios S. Verykios (University of Thessaly, Greece)
DOI: 10.4018/978-1-60566-906-9.ch007
OnDemand PDF Download:


Since its inception in 2000, privacy preserving data mining has gained increasing popularity in the data mining research community. This line of research can be primarily attributed to the growing concern of individuals, organizations and the government regarding the violation of privacy in the mining of their data by the existing data mining technology. As a result, a whole new body of research was introduced to allow for the mining of data, while at the same time prohibiting the leakage of any private and sensitive information. In this chapter, the authors introduce the readers to the field of privacy preserving data mining; they discuss the reasons that led to its inception, the most prominent research directions, as well as some important methodologies per direction. Following that, the authors focus their attention on very recently investigated methodologies for the offering of privacy during the mining of user mobility data. In the end of the chapter, they provide a roadmap along with potential future research directions both with respect to the field of privacy-aware mobility data mining and to privacy preserving data mining at large.
Chapter Preview


Since the pioneering work of Agrawal & Srikant (2000) and Lindell & Pinkas (2000), several approaches have been proposed for the offering of privacy in data mining. Most existing approaches can be classified along two broad categories: (a) methodologies that protect the sensitive data itself in the mining process, and (b) methodologies that protect the sensitive data mining results (i.e. the extracted knowledge patterns) that were produced by the application of data mining. The first category refers to methodologies that apply perturbation, sampling, generalization/suppression, transformation, etc. techniques to the original datasets in order to generate their sanitized counterparts that can be safely disclosed to untrusted third parties. The goal of this category of approaches is to enable the data miner to get accurate data mining results when is not provided with the real data.

As part of former category we highlight methodologies that have been proposed to enable a number of data holders to collectively mine their data without having to reveal their datasets to each other. On the other hand, the second category deals with distortion and blocking techniques that prohibit the disclosure of sensitive knowledge patterns derived through the application of data mining algorithms, as well as techniques for downgrading the effectiveness of the classifiers in classification tasks, such that they do not reveal any sensitive knowledge.

Complete Chapter List

Search this Book: