Though an unparalleled amount and diversity of imaging and clinical data are now collected as part of routine care, this information is not sufficiently integrated and organized in a way that effectively supports a clinician’s ability to diagnose and treat a patient. The goal of this chapter is to present a framework for organizing, representing, and manipulating patient data to assist in medical decision-making. We first demonstrate how probabilistic graphical models (specifically, Bayesian belief networks) are capable of representing medical knowledge. We then propose a data model that facilitates temporal and investigative organization by structuring and modeling clinical observations at the patient level. Using information aggregated into the data model, we describe the creation of multi-scale, temporal disease models to represent a disease across a population. Finally, we describe visual tools for interacting with these disease models to facilitate the querying and understanding of results. The chapter concludes with a discussion about open problems and future directions.
More patient data is being gathered, given the adoption of the electronic medical record (EMR), the availability of clinical tests, and the growing rates of chronic diseases (Wyatt & Wright, 1998). Modern medical records are not only comprised of traditional data (e.g., clinical notes, labs), but also digital images (e.g., computed tomography, magnetic resonance imaging) and other graphical representations (e.g., pulmonary function graphs). Notably, medical imaging is becoming the predominant in vivo tool for objectively documenting patient presentation and clinical findings. Patient care is largely dependent upon imaging to understand disease processes and to establish tangible evidence of treatment response. However, even within imaging, the scope of data collected can range from the cellular level (e.g., molecular imaging) to tissue level (e.g., histopathology), up to the level of the organism itself (e.g., conventional radiology). As the quantity and diversity of collected data continues to grow, the task of consolidating this information in a way that improves patient care becomes a challenge: clinicians need effective tools to organize, access, and review the data. For instance, current methods for querying image data are limited to a set of keywords (e.g., stored as part of the image header), but much of the clinically useful information about a disease is contained within the image itself (e.g., mass volume, border, shape). Advances in image processing have resulted in sophisticated algorithms for automated content extraction (Pham, Xu, & Prince, 2000), enabling the characterization, segmentation, and classification of pixel data to extract meaningful features from an image (e.g., regions of edema). However, the extraction of meaningful features from patient data alone is insufficient. While imaging data provides a phenotypic description of disease progression, the combination of imaging and other clinical observations has the potential to better model and predict disease behavior. A collective understanding of how features from different levels are needed: a finding observed at the phenotype level can be explained as an emergent manifestation of multiple findings at the genotype level. For instance, the cellular level serves as the basis for describing genetic/proteomic irregularities that lead to larger scale effects that are seen at the tissue, organ, and organism levels. While research in the area of intelligent data analysis (IDA) has explored content extraction and representation, current approaches have significant limitations: 1) they do not capture the context in which the data was collected; 2) the data is not represented in a way that facilitates a computer’s ability to reason with the information; and 3) a lack of tools exists for facilitating the querying and understanding of the stored data.
This chapter describes efforts, particularly those undertaken by the Medical Imaging Informatics Group at the University of California, Los Angeles (UCLA), to address these issues by transforming clinical observations into a representation enabling a computer to “understand” and reason with the data. Computer understanding, in this context, is defined as being able to determine the relative importance of a given data element (e.g., necrosis size) in the patient record in relation to a phenomenon of interest (e.g., brain tumor). The chapter is organized as follows: Section 2 provides an overview of IDA and recent work in the area towards creating expert systems. While various techniques for representing medical knowledge exist, this chapter focuses on probabilistic graphical models. Section 3 introduces a phenomenon-centric data model (PCDM) that structures clinical observations at the patient level by organizing findings (i.e., collected data) around a given phenomenon (i.e., medical problem). Section 4 describes the process of generating multi-scale, temporal disease models using dynamic Bayesian belief networks to represent a disease across a population: these steps are illustrated in the context of our efforts to develop tools that help assess and manage patients with brain tumors. Subsequently, Section 5 discusses a novel interface for querying these models using a visual paradigm to facilitate the composition of queries related to image features. The chapter concludes by describing open problems in the area and identifying potential directions for future work.
Key Terms in this Chapter
Intelligent Data Analysis: The use of statistical, pattern recognition, machine learning, data abstraction, and visualization tools for analysis of data and discovery of mechanisms that created the data.
Probabilistic Graphical Model: A graph that represents independencies among random variables by a graph in which each node is a random variable and missing edges represent conditional independencies.
Data Mining: The principle of sorting through large amounts of data and picking out relevant information.
Dynamic Bayesian Network: A directed graphical model of stochastic processes that generalize hidden Markov models and are typically used to model a time series.
Visual Query Interface: A tool that enables user to visually interact with the underlying graphical model and guides the user through the query formulation process by adapting the interface based on the structure of the model.
Graphical Metaphor: Unique and identifiable visual representations of variables specified in the disease model.
Bayesian Belief Network: A directed acyclic graph that represents a set of variables and their probabilistic independencies.