Differential Privacy Approach for Big Data Privacy in Healthcare

Differential Privacy Approach for Big Data Privacy in Healthcare

Marmar Moussa (University of Connecticut, USA) and Steven A. Demurjian (University of Connecticut, USA)
Copyright: © 2017 |Pages: 23
DOI: 10.4018/978-1-5225-2486-1.ch009

Abstract

This chapter presents a survey of the most important security and privacy issues related to large-scale data sharing and mining in big data with focus on differential privacy as a promising approach for achieving privacy especially in statistical databases often used in healthcare. A case study is presented utilizing differential privacy in healthcare domain, the chapter analyzes and compares the major differentially private data release strategies and noise mechanisms such as the Laplace and the exponential mechanisms. The background section discusses several security and privacy approaches in big data including authentication and encryption protocols, and privacy preserving techniques such as k-anonymity. Next, the chapter introduces the differential privacy concepts used in the interactive and non-interactive data sharing models and the various noise mechanisms used. An instrumental case study is then presented to examine the effect of applying differential privacy in analytics. The chapter then explores the future trends and finally, provides a conclusion.
Chapter Preview
Top

Introduction

Big Data analysis influences most aspects of our modern society, such as mobile services, retail, manufacturing, financial services, medicine and life sciences, as well as physical sciences to name a few (Bertino et al., 2011). Scientific research is being revolutionized by Big Data everyday, for instance in bioinformatics with Next Generation Sequencing increasing the size and number of experimental data sets exponentially. In healthcare, Big Data with transforming patient care towards prevention with substantial home-based and continuous form of monitoring available to patients is definitely personalizing healthcare to the benefit of patients. While the potential benefits of Big Data are real and significant, there remain several considerable technical challenges. However, in this broad range of application areas, data is being collected at an unprecedented scale. The emergence and ever increasing emphasis on the big data era means that more and more information on an individual’s health, financials, location, and online activity are continuously being harvested, collected, and processed in the cloud and stored in big data repositories. This results in increased concerns regarding the privacy of these large sets of personal data and the loss of an individual’s control over his/her sensitive data (Boyd & Crawford, 2012).

The impact of privacy concerns on a big data application is particularly evident in the healthcare domain which has a long established history in requiring that health information technology must comply with the Health Insurance Portability and Accountability Act (HIPAA) for most importantly release of a patient's medical information as well as security and availability as well. HIPAA must also apply to big-data applications for healthcare. This is strongly tied to a movement towards patient controlled access to their medical information with patients able to define the privacy to determine who can see what information at which times. This is evidenced by work that has emphasized granularity and patient control (Sujansky et al., 2010) and a lifetime electronic health record with complete information available anywhere (Caine, 2013). In healthcare there is a need to distinguish levels of security based on the confidentiality and privacy of the data itself and the way that a patient would seek to make such data available to stakeholders. All of these security and privacy concerns must be addressed within big data applications for healthcare as well as in other domains.

This chapter explores the issues related to the security in general and privacy in specific for big data applications, particularly given that the usage of state-of-the-art analytics has explicitly led to growing privacy concerns. As a result, protecting privacy becomes quite harder as information is processed multiple times and shared among multiple diverse entities in the cloud. One example of this problem involves de-identification and anonymization techniques that have been utilized under the false assumption that they allow organizations to reap the benefits of analytics while preserving individuals' privacy. This relies on the assumption that removing certain personal information from a data set would ensure the identity of the users participating in that data set to remain anonymous. However, this has proved to be a misconception as demonstrated by several re-identification and linkage attacks that different data sources harmfully leak private information when combined and when adversaries are able to use some background knowledge, this will be further discussed in the section “Big Data Security and Privacy Issues”.

Complete Chapter List

Search this Book:
Reset