Privacy-Preserving Data Mining and the Need for Confluence of Research and Practice

Privacy-Preserving Data Mining and the Need for Confluence of Research and Practice

Lixin Fu (University of North Carolina at Greensboro, USA), Hamid Nemati (University of North Carolina at Greensboro, USA) and Fereidoon Sadri (University of North Carolina at Greensboro, USA)
DOI: 10.4018/978-1-60566-210-7.ch005
OnDemand PDF Download:


Privacy-Preserving Data Mining (PPDM) refers to data mining techniques developed to protect sensitive data while allowing useful information to be discovered from the data. In this chapter the review PPDM and present a broad survey of related issues, techniques, measures, applications, and regulation guidelines. The authors observe that the rapid pace of change in information technologies available to sustain PPDM has created a gap between theory and practice. They posit that without a clear understanding of the practice, this gap will be widening, which, ultimately will be detrimental to the field. They conclude by proposing a comprehensive research agenda intended to bridge the gap relevant to practice and as a reference basis for the future related legislation activities.
Chapter Preview

1. Introduction

Technological advances, decreased costs of hardware and software, and the world-wide-web revolution have allowed for vast amounts of data to be generated, collected, stored, processed, analyzed, distributed and used at an ever-increasing rate by organizations and governmental agencies. According a survey by U.S. Department of Commerce, an increasing number of Americans are going online and engaging in several online activities, including online purchases and conducting banking online. The growth in Internet usage and e-commerce has offered businesses and governmental agencies the opportunity to collect and analyze information in ways never previously imagined. “Enormous amounts of consumer data have long been available through offline sources such as credit card transactions, phone orders, warranty cards, applications and a host of other traditional methods. What the digital revolution has done is increase the efficiency and effectiveness with which such information can be collected and put to use” (Adkinson, Eisenach, & Lenard, 2002).

Simultaneously, there is a growing awareness that by leveraging their data resources to develop and deploy data mining technologies to enhance their decision-making capabilities, organizations can gain and sustain a competitive advantage (Eckerson & Watson, 2001). If correctly deployed, Data Mining (DM) offers organizations an indispensable decision-enhancing process that optimizes resource allocation and exploits new opportunities by transforming data into valuable knowledge (Nemati, Barko, & Christopher, 2001). Correctly deploying data mining has the potential of significantly increasing a company’s profits and reducing its costs by helping to identify areas of potential business, or areas that the company needs to focus its attention on, or areas that should be discontinued because of poor sales or returns over a period of time. For example, data mining can identify customer buying patterns and preferences which would allow for a better management of inventory and new merchandising opportunities. However, when data contains personally identifiable attributes, if data mining is used in the wrong context, it can be very harmful to individuals. Data mining may “pose a threat to privacy” in the sense that sensitive personal data may be exposed directly or discovered patterns can reveal confidential personal attributes about individuals, classify individuals into categories, revealing in that way confidential personal information with certain probability. Moreover, such patterns may lead to generation of stereotypes, raising very sensitive and controversial issues, especially if they involve attributes such as race, gender or religion. An example is the debate about studies of intelligence across different races.” (Estivill-Castro, Brankovic, & Dowe, 1999). As another example, individual patient medical records are stored in electronic databases by government and private medical providers (Hodge, Gostin, & Jacobson, 1999). The proliferation of medical databases within the healthcare information infrastructure presents significant benefits for medical providers and patients, including enhanced patient autonomy, improved clinical treatment, advances in health research and public health surveillance (Hodge et al., 1999). However, use and mining of this type of data presents a significant risk of privacy. Therefore not only protecting the confidentiality of personally identifiable health data is critical, but also insufficient protections of what could be mined from it can subject the individuals to possible embarrassment, social stigma, and discrimination (Hodge et al., 1999).

The significance of data security and privacy has not been lost to the data mining research community as was revealed in Nemati and Barko (Nemati et al., 2001) of the major industry predictions that are expected to be key issues in the future (Nemati et al., 2001). Chiefly among them are concerns over the security of what is collected and the privacy violations of what is discovered ((Margulis, 1977), (Mason, 1986), (Culnan, 1993), (Smith, 1993), (Milberg, S. J., Smith, & Kallman, 1995), and (Smith, Milberg, & Burke, 1996)). About 80 percent of survey respondents expect data mining and consumer privacy to be significant issues (Nemati et al., 2001).

Complete Chapter List

Search this Book: