Policy Enforcement System for Inter-Organizational Data Sharing

Policy Enforcement System for Inter-Organizational Data Sharing

Mamoun Awad (UAE University, UAE), Latifur Khan (The University of Texas at Dallas, USA) and Bhavani Thuraisingham (The University of Texas at Dallas, USA)
DOI: 10.4018/978-1-4666-0026-3.ch011
OnDemand PDF Download:


Sharing data among organizations plays an important role in security and data mining. In this paper, the authors describe a Data Sharing Miner and Analyzer (DASMA) system that simulates data sharing among N organizations. Each organization has its own enforced policy. The N organizations share their data based on trusted third party. The system collects the released data from each organization, processes it, mines it, and analyzes the results. Sharing in DASMA is based on trusted third parties. However, organizations may encode some attributes, for example. Each organization has its own policy represented in XML format. This policy states what attributes can be released, encoded, and randomized. DASMA processes the data set and collects the data, combines it, and prepares it for mining. After mining, a statistical report is produced stating the similarities between mining with data sharing and mining without sharing. The authors test, apply data sharing, enforce policy, and analyze the results of two separate datasets in different domains. The results indicate a fluctuation on the amount of information loss using different releasing factors.
Chapter Preview

1. Introduction

Data sharing among organizations has become a critical research topic. Sharing data among organization is governed by the sharing policies maintained and enforced by the organizations rules and by the government laws. As a result of that, the amount of information used, in any certain sharing scenario among organizations, is smaller than or equal to the whole information maintained in all such organizations.

In our previous paper we discussed our Policy-based Information Sharing System (Kumar, Khan, & Thuraisingham, 2008). In this current paper we extend our previous research by examining how much information is lost by enforcing policies. There has always been a dichotomy between information sharing and policy enforcement. However none of the previous work has focused on information loss. The work we are discussing in this current paper is the first attempt to our knowledge on computing the information loss. This will give guidance to those who have a need to share information securely.

In this current work we study the effect of information hiding on the amount of knowledge obtained using standard machine learning techniques. Hiding information is represented by the policies and regulations enforced by the organization. We introduce the releasing factor measure that indicates the percentage of attributes an organization releases to the total number of attributes such organization has. For mining the shared data, we consider Association Rule Mining.

It is important to point out that, in this study, we assume that all organizations are trusted parties. However, each organization abides by its policies and rules in order to release data. For each organization, we develop sharing policies that govern what kind of data an organization can release. For example, a medical organization, can release information about blood pressure and temperature of patients. However, it cannot release type of illness each patient has.

Also, we try to simulate a realistic scenario of data partitioning. For example, for a specific entity, such as patient, one organization, such as the hospital, might have attributes/fields about the patient medications. However, for another organization, such as insurance companies, such fields are missing. We consider three different partitioning of the attributes, namely, horizontal, vertical, and hybrid partitioning. In horizontal partitioning, we simulate the scenario in which one organization has all fields/attributes about some entities. In vertical partitioning, an organization knows all entities, however, it has some of the fields/attributes about each. In hybrid partitioning, we assume horizontal and vertical knowledge about entities and attributes/fields, i.e., some entities are known totally or partially by some organizations. Notice that data partitioning is related to the layout of the data (see Section 2 for details). It is also important to point out that we assume that there is a fixed set of attributes/fields about entities.

After partitioning the dataset, we assume a centralized trust broker, which requests the information from different parties and mines the data. When the broker requests data from an organization x, organization x will apply its policy first, and then send a compliant data, with x policy, to the broker (see Figure 1).

Figure 1.

Communications between the broker and an organization

The process of mining shared data among organizations poses several challenges related to the automation of data sharing. First, data disclosure might not be possible because organizations are limited to their sharing policy, i.e., an organization might not release all the data that it has because, for example, of privacy issues. Next, data reprocessing as a result of discrepancies of the format, representation, scales, etc. of the data among organizations. Finally, human intervention to resolve issues such as mapping data from one organization data base to another. That is because it is possible that two attributes have the same names, however, different meaning and vice versa.

Complete Chapter List

Search this Book: