Preparing Clinical Text for Use in Biomedical Research
John P. Pestian (Cincinnati Children’s Hospital Medical Center, University of Cincinnati, USA), Lukasz Itert (Nicolaus Copernicus University, Torun, Poland) and Charlotte Andersen (Cincinnati Children’s Hospital Medical Center, University of Cincinnati, US)
Copyright: © 2009
Approximately 57 different types of clinical annotations construct a patient’s medical record. These annotations include radiology reports, discharge summaries, and surgical and nursing notes. Hospitals typically produce millions of text-based medical records over the course of a year. These records are essential for the delivery of care, but many are underutilized or not utilized at all for clinical research. The textual data found in these annotations is a rich source of insights into aspects of clinical care and the clinical delivery system. Recent regulatory actions, however, require that, in many cases, data not obtained through informed consent or data not related to the delivery of care must be made anonymous (as referred to by regulators as harmless), before they can be used. This article describes a practical approach with which Cincinnati Children’s Hospital Medical Center (CCHMC), a large pediatric academic medical center with more than 761,000 annual patient encounters, developed open source software for making pediatric clinical text harmless without losing its rich meaning. Development of the software dealt with many of the issues that often arise in natural language processing, such as data collection, disambiguation, and data scrubbing.