Medical Information Extraction in European Portuguese

Medical Information Extraction in European Portuguese

Liliana Ferreira (University of Aveiro, Portugal), António Teixeira (University of Aveiro, Portugal) and João Paulo Silva Cunha (University of Aveiro, Portugal)
DOI: 10.4018/978-1-4666-3986-7.ch032

Abstract

The electronic storage of medical patient data is becoming a daily experience in most of the practices and hospitals worldwide. However, much of the available data is in free text form, a convenient way of expressing concepts and events but especially challenging if one wants to perform automatic searches, summarization, or statistical analyses. Information Extraction can relieve some of these problems by offering a semantically informed interpretation and abstraction of the texts. MedInX, the Medical Information eXtraction system presented in this chapter is designed to process textual clinical discharge records in order to perform automatic and accurate mapping of free text reports onto a structured representation. MedInX components are based on Natural Language Processing principles and provide several mechanisms to read, process, and utilize external resources, such as terminologies and ontologies. MedInX current practical applications include automatic code assignment and an audit system capable of systematically analyze the content and completeness of the clinical reports. Recent evaluation efforts on a set of authentic patient discharge letters indicate that the system performs with 95% precision and recall.
Chapter Preview
Top

Introduction

The rapid adoption of Electronic Health Records (EHR) in the clinical domain, with the parallel growth of narrative data in electronic form, is a strong incentive for the development of Medical Language Processing (MLP) systems. However, much of the available clinical data is in narrative form. While narratives are a convenient way of expressing concepts and events, they can be difficult if one wants to perform automatic searches, summarization, decision support or statistical analysis. To overcome this problem and also to ensure improved quality control and reduced medical errors, structured data is required. This is where Natural Language Processing (NLP) and more precisely Information Extraction (IE) is needed.

Electronic access to health information is becoming more frequent worldwide through numerous electronic health information systems, which generate millions of gigabytes of health information annually, more than in many other domains (Cios and Moore, 2002). In fact, this type of access to information is critical for improved health care and to reduce medical errors. For example, in Portugal, with a population of approximately ten million inhabitants, the latest available statistics (National Institute of Statistics, 2009) report that every year there is almost a million of hospital admissions and forty six million specialized care outpatient visits in the National Health System.

Several efforts are currently being conducted to generalize the use of EMRs in the Portuguese health institutions, but most of the information available in these systems is in textual form and, even if electronically available, remains locked up within text. Enriching EMR systems with rich domain knowledge and rules would greatly enhance their performance and ability to support clinical decisions. However, the content of health records is extremely complex, as a considerable part of a patient health records is free-form written or spoken text. Particularly, they describe a sequence of events, narratives, reflecting the need for a precise and complete explanation when describing the health status of a patient. This type of expressive description also bears a substantial ambiguity and personal differences in vocabulary and style (Lovis et al., 2000; Suominen, 2009). Frequently, specialists from the same discipline cannot agree on unambiguous terms to be used while describing a patient's condition.

Information search from this type of narrative text is difficult and time consuming. Human Language Technology (HLT) and, particularly, NLP, is increasingly gaining the interest of both health care practitioners and academic researchers by offering the possibility of extracting precise facts from a document set and of finding interesting associations among disparate facts, leading to the discovery of new or unsuspected knowledge (Ananiadou and Mcnaught, 2005) instead of leaving the health professional with the problem of having to read several tens of thousands of documents.

The present chapter focuses on MedInX, a Medical Information eXtraction system tailored to process textual clinical discharge records in order to perform automatic and accurate mapping of free text reports onto a structured representation. MedInX uses IE technology to structure the information present in discharge reports originated by the EHR system being used in the region of Aveiro in Portugal, the Telematic Healthcare Network RTS®(Cunha et al., 2006), and automatically instantiates a knowledge representation model from the free-text patient discharge letters.

The main goal of MedInX is to improve access to clinical reports, and, consequently, enable faster and more accurate statistical data creation and analysis. Improving the access to quality information on health for both patient and health professionals also contributes to reduce medical errors and to increase safety and efficiency, while the creation of a semantically informed interpretation of texts contributes to enable patient mobility and borderless access to health care. Moreover, MedInX reaches the previous objectives while allowing physicians to continue to practice using their current descriptive language in free-text reports without a requirement to enter structured data in a complex, time-consuming computer-based system.

Key Terms in this Chapter

Medical Language Processing: Field of Computer Science and Linguistic concerned with the analysis and generation of natural human language from the clinical and medical subareas.

Clinical Knowledge Representation: Aims representing and managing, both medical knowledge and clinical processes in a unified way.

Medical Ontologies: Medical ontologies aim to capture consensual medical knowledge in a generic, reusable and sharable way across software applications and groups of people.

Decision Support Systems: Computer-based systems that facilitate the use of information in human decision making.

Information Extraction: Sub-discipline of NLP which goal is to find information from text without requiring the end user of the information to read the text.

Knowledge Representation: Subfield of Artificial Intelligence whose main goal is to study how knowledge about the world can be represented and what kinds of reasoning can be done with that knowledge.

Human Language Technologies: Computational tools that analyze and generate natural human language.

Automatic Coding: The task of automatically assigning codes, usually belonging to international classification systems, to free text clinical reports.

Complete Chapter List

Search this Book:
Reset