Entity Resolution in Healthcare

Entity Resolution in Healthcare

Copyright: © 2014 |Pages: 14
DOI: 10.4018/978-1-4666-5198-2.ch017
(Individual Chapters)
No Current Special Offers


Abbreviations are common in biomedical documents, and many are ambiguous in the sense that they have several potential expansions. Identifying the correct expansion is necessary for language understanding and important for applications such as document retrieval. Identifying the correct expansion can be viewed as a Word Sense Disambiguation (WSD) problem. Previous approaches to resolving this problem have made use of various sources of information including linguistic features of the context in which the ambiguous term is used and domain-specific resources, such as UMLS. This chapter compares a range of knowledge sources, which have been previously used, and introduce a novel one: MeSH terms. The best performance is obtained using linguistic features in combination with MeSH terms.
Chapter Preview


When we talk about an entity, it may refer to a person, a department, a team, corporation, cooperative, partnership, or other group with whom it is possible to conduct business. In the formal definition, an entity is something that exists by itself, although it need not be of material existence. In particular, abstractions and legal fictions are usually regarded as entities. In general, there is also no presumption that an entity is animate. Entity resolution is the process of finding non-identical duplicates in a relation and merging the duplicates into a single tuple (record). Record linkage is the process of finding related entries in one of more related relations in a database and creating links among them. Entity resolution and record linkage are important steps in data cleaning, which is removal of inaccuracies in databases and, as such, is part of populating a data warehouse. Generally, data warehouses are important repositories for organization reporting on historical data. Where this information is derived from entities in the organization’s concern, it is important for the underlying data to be as accurate as possible. Additionally, because duplicate entries are not allowed in databases, entity resolution can be useful in establishing when a tuple about to be entered will be a copy of one already present. Hence, it is very useful in maintaining of the integrity of a traditional database by providing accurate and consistent data. Moreover, entity resolution, also known as data matching or record linkage, is the task of identifying and matching records from several databases that refer to the same entities. Entity resolution is an important information quality process required before accurate analyses of entity-related data are possible. In the real life, we usually come across that something refers to the same real-world entity but in different representations. Also, this situation will happen frequently. For example, both CA and California refer to the same state of the United States but within different characters and length. Particularly, what we should do when come across this situation is to remove the situation of duplicate records that have same meaning but different characters in our databases.

Health information exchange (HIE) is the mobilization of healthcare information electronically across organization within a region, community or hospital system. Health information exchange systems facilitate the efforts of physicians and clinicians to meet high standards of patient care through electronic participation in a patient’s continuity of care with multiple providers. The health information exchange implementation challenge will be create a standardized interoperable model that is patient centric, trusted, longitudinal, scalable, sustainable, and reliable. However, many health organizations are increasingly faced with the challenge of having large databases containing references to patients, physicians, drugs, and other entities that need to be matched in real-time with a stream of query records also containing entity references. Often, different people from different places may provide the same information with different forms in all kinds of data types. How to determine which one will be the useful and whether they are the same is meaningful for HIE to construct their systems. Hence, Entity resolution is a core process in health information exchange systems that have been evolving to address this problem. Most existing entity resolution methods focus on automated entity resolutions which are not perfect and face a precision-recall trade-off. By contrast, hand-cleaning methods even with visualization support can be slow and inefficient in finding duplicates but tend to be high precision, because there is a human-in-the-loop marking the final resolution decision. However, inspecting a large data set and hunting for duplicates can be like looking for the proverbial needle in a haystack. Therefore, while these approaches may have high precision, they tend to have low recall.

Complete Chapter List

Search this Book: