Methodology for Record Linkage: A Medical Domain Case Study

Methodology for Record Linkage: A Medical Domain Case Study

Maria Vargas-Vera
DOI: 10.4018/978-1-5225-1759-7.ch027
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

This paper presents a methodology for linking records from several sources each source might contain, missing information. This assumption of missing values has been made, without loss of generality, as the authors has observed that missing information is part of the nature of data in the health domain and also in other domains such as social sciences. The author's methodology is an attempt to deal with the linkage of records of the same patient in several databases. The first phase in her methodology is called homogenization. The homogenization of the databases/datasets is performed by applying a method which fills-in the missing values with the predicted values. The second phase of her methodology is called linking of records. It assesses the similarity between records and implements the linkage of the pairs of records with high level of similarity. Finally, the author presents an evaluation of our methodology. The evaluation of the homogenization phase was carried out using multinomial regression while, the evaluation of the aggregated similarities were performed using Jaccard, Jaro-Winkler and Monge-Elkan similarity metrics.
Chapter Preview
Top

This section shows the state of the art from two perspectives namely missing values and linking records. Firstly, we describe the state of the art to the problem of missing values in databases and a secondly, the state of the art in linking records.

Complete Chapter List

Search this Book:
Reset