Proposal for Interactive Anonymization of Electronic Medical Records

Proposal for Interactive Anonymization of Electronic Medical Records

Carlos Andrés Moque Millán (Pontificia Universidad Javeriana, Colombia), Alexandra Pomares Quimbaya (Pontificia Universidad Javeriana, Colombia) and Rafael A. Gonzalez (Pontificia Universidad Javeriana, Colombia)
DOI: 10.4018/978-1-4666-3667-5.ch011
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

One of the most important inputs for medical research is the information registered in electronic medical records. This information typically contains sensitive data that must be preserved in order to be used for research or educational purposes, and protected depending on the regulations of each country and institution. In order to assure confidentiality of data, different techniques can be used to remove basic identifiers (e.g. names, IDs); however, these techniques can be easily bypassed by attackers who know the information that can act as pseudo-identifiers of patients (e.g. birthdates, gender). Although these pseudo-identifiers can also be removed, the information they contain is valuable for medical research. To face this problem, different methods that allow minimizing the risk of sharing confidential information have been proposed. The interactive use of anonymization algorithms for electronic medical records is the main contribution of this chapter, dubbed AnonymousData.co: a proposal for anonymization of electronic health records.
Chapter Preview
Top

Conceptual Background Of Anonymization

Before presenting some of the available anonymization alternatives, it is useful to define some relevant concepts used throughout the chapter. Sensitive data refers to private information which must be protected in order to guarantee confidentiality and which may be misused with harmful purposes – for example, a chronic disease diagnosis for a given person. Quasi-Identifiers (QI) are attributes (from a database table) which, when grouped together, may become identifiers that reveal sensitive data – for example, while a national ID number unequivocally identifies an individual, it may also be possible to identify him or her if you know the age, gender and approximate address. An attacker is an individual which misuses QIs in order to extract sensitive information about a person or group with the intent to harm or benefit from this data – for example, a given company may reject a job application after wrongfully obtaining data indicating a chronic disease the applicant may have. Microdata refers to a database that has already been processed or prepared (Ghinita, 2011; Min, 2008).

In Table 1 an example is given if a subset of data from the electronic patient records, in our case to be used for data mining. The names and addresses have been purposefully altered – and it is worth noting that the problem of anonymization starts with those doing the anonymization, for which it is critical, as done within the context of this specific research project, to sign a confidentiality and ethics agreement beforehand.

Complete Chapter List

Search this Book:
Reset