Preserving Privacy in Time Series Data Mining

Preserving Privacy in Time Series Data Mining

Ye Zhu (Cleveland State University, USA), Yongjian Fu (Cleveland State University, USA) and Huirong Fu (Oakland University, USA)
Copyright: © 2013 |Pages: 22
DOI: 10.4018/978-1-4666-2148-0.ch015
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Time series data mining poses new challenges to privacy. Through extensive experiments, the authors find that existing privacy-preserving techniques such as aggregation and adding random noise are insufficient due to privacy attacks such as data flow separation attack. This paper also presents a general model for publishing and mining time series data and its privacy issues. Based on the model, a spectrum of privacy preserving methods is proposed. For each method, effects on classification accuracy, aggregation error, and privacy leak are studied. Experiments are conducted to evaluate the performance of the methods. The results show that the methods can effectively preserve privacy without losing much classification accuracy and within a specified limit of aggregation error.
Chapter Preview
Top

Introduction

Privacy has been identified as an important issue in data mining. The challenge is to enable data miners to discover knowledge from data, while protecting data privacy. On one hand, data miners want to find interesting global patterns. On the other hand, data providers do not want to reveal the identity of individual data. This leads to the study of privacy-preserving data mining (Agrawal & Srikant, 2000).

Two common approaches in privacy-preserving data mining are data perturbation and data partitioning. In data perturbation, the original data is modified by adding noise, aggregating, transforming, obscuring, and so on. Privacy is preserved by mining the modified data instead of the original data. In data partitioning, data is split among multiple parties, who securely compute interesting patterns without sharing data.

However, privacy issues in time series data mining go beyond data identity. In time series data mining, characteristics in time series can be regarded as private information. The characteristics can be trend, peak and trough in time domain or periodicity in frequency domain. For example, a company’s sales data may show periodicity which can be used by competitors to infer promotion periods. Certainly, the company does not want to share such data. Moreover, existing approaches to preserve privacy in data mining may not protect privacy in time series data mining. In particular, aggregation and naively adding noise to time series data are prone to privacy attacks.

In this paper, we study privacy issues in time series data mining. The objective of this research is to identify effective privacy-preserving methods for time series data mining. We first present a model for publishing and mining time series data and then discuss potential attacks on privacy. As a counter measure to privacy threat, we propose to add noise into original data to preserve privacy. The effects of noise on preserving privacy and on data mining performance are studied. The data mining task in our study is classification and its performance is measured by classification accuracy.

We propose a spectrum of methods for adding noise. For each method, we first explain the intuition behind the idea and then present its algorithm. The methods are implemented and evaluated in terms of their impacts on privacy preservation, classification accuracy, and aggregation error in experiments. Our experiments show that these methods can preserve privacy without seriously sacrificing classification accuracy or increasing aggregation error.

The contributions of our paper are: (a) We identify privacy issues in time series data mining and propose a general model for protecting privacy in time series data mining. (b) We propose a set of methods for preserving privacy by adding noise. Their performance is evaluated against real data sets. (c) We analyze the effect of noise on preserving privacy and the impact on data mining performance for striking a balance between the two.

The rest of the paper is organized as follows. In Section 2, we discuss related work in privacy preserving and time series data mining. A general model for publishing and mining time series data is proposed in Section 3, along with discussion on its privacy concerns. Methods for preserving privacy by adding noise are proposed in Section 4. The effects of noise on privacy preserving, classification accuracy, and aggregation error are studied in Section 5. Related issues are discussed in Section 6. Section 7 concludes the study and gives a few future research directions.

Complete Chapter List

Search this Book:
Reset