The Intelligent Data Brokerage: A Utility-Enhancing Architecture for Algorithmic Anonymity Measures

The Intelligent Data Brokerage: A Utility-Enhancing Architecture for Algorithmic Anonymity Measures

Nolan Hemmatazad (College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, USA), Robin Gandhi (College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, USA), Qiuming Zhu (College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, USA) and Sanjukta Bhowmick (College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, USA)
DOI: 10.4018/ijphim.2014010102
OnDemand PDF Download:
List Price: $37.50


The anonymization of widely distributed or open data has been a topic of great interest to privacy advocates in recent years. The goal of anonymization in these cases is to make data available to a larger audience, extending the utility of the data to new environments and evolving use cases without compromising the personal information of individuals whose data are being distributed. The resounding issue with such practices is that, with any anonymity measure, there is a trade-off between privacy and utility, where maximizing one carries a cost to the other. In this paper, the authors propose a framework for the utility-preserving release of anonymized data, based on the idea of intelligent data brokerages. These brokerages act as intermediaries between users requesting access to information resources and an existing database management system (DBMS). Through the use of a formal language for interpreting user information requests, customizable anonymization policies, and optional natural language processing (NLP) capabilities, data brokerages can maximize the utility of data in-context when responding to user inquiries.
Article Preview


Proper anonymization, particularly for protecting highly sensitive or personal data, is a phenomenon of great concern to a number of parties (Aggarwal et al., 2006; Diaz et al., 2002). This concern is amplified in modern times, where new technologies allow such information to flow seamlessly from one side of the globe to the next, with very little delay. To compound matters, a new wave of socially-enabled technologies has arisen, setting a standard for the open dissemination of user information, whether it be sharing reviews for a recent meal, purchase details, daily thoughts and activities, demographic information, or just about anything else. It is clear, too, that users are willing to provide this information, as witnessed by the explosive growth of these services and their active user count and service utilization statistics.

Concerns for anonymity aside, the publishing of these sorts of data can be a fruitful endeavor. It allows information to be explored and utilized in new and interesting ways, leading to further innovations and technological or methodological breakthroughs that help illuminate the true potential of the data being worked with. For organizations choosing to publish such data, however, there is a concern that information they do not want to reveal may be made public, and could potentially be traced back to them or their users. This, in turn, can dissuade organizations from making various user data available to third-parties (Kelly et al., 2008).

To address this concern, several methods for anonymizing large sets of data have been described in the academic literature, some rudimentary and others more nuanced. The underlying problem with each of the known approaches, however, is that there is no perfect way to assess what useful information is being (potentially unnecessarily) lost in the anonymization process. This results in a paradigm where, as anonymity increases, the utility of the data decreases.

In this article, we offer an innovative new architecture for deploying existing algorithmic anonymization approaches. We refer to this set of ideas as intelligent data brokerages. These brokerages are intermediary services that operate between the third-party requesting information from a provider, and the provider itself. Whereas in a traditional context, users would simply download aggregate raw data sets that they could then work with to achieve their desired outcome, with data brokerages, users model their goals in terms of a predefined formal language, and the brokerage provides them with output relevant to their specific request, bypassing the bulk release of data altogether. The brokerage can further apply various anonymization policies to govern how such results are disclosed to the user, and to control the unintended release of sensitive information. When it is essential to respond to a user’s requests for information with more expansive views of the data, the brokerage may employ traditional anonymization techniques to ensure the privacy of data subsets prior to disclosure. In doing so, intelligent data brokerages represent a general-purpose architecture for enforcing data privacy constraints, in contrast with other privacy-oriented broker models that tend to operate within a specific domain, such as digital advertising (see, for example: Narayanaswami et al., 2008; Guha et al., 2011), and which, themselves, have received limited attention in the existing literature.

The remainder of this paper is structured as follows: we begin with a review of current anonymity methods and motivators. We then formally introduce the idea of the intelligent data brokerage, its structure, subsystems, and capabilities. After this, we consider a sample use case for the intelligent data brokerage, as well as provide an experimental demonstration of a rudimentary prototype, before concluding with the contributions and takeaways of our work.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 5: 2 Issues (2017)
Volume 4: 2 Issues (2016)
Volume 3: 2 Issues (2015)
Volume 2: 2 Issues (2014)
Volume 1: 2 Issues (2013)
View Complete Journal Contents Listing