Bayesian Networks in the Health Domain

Bayesian Networks in the Health Domain

Shyamala G. Nadathur (Monash University, Australia)
DOI: 10.4018/978-1-60566-908-3.ch014
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

Large datasets are regularly collected in biomedicine and healthcare (here referred to as the ‘health domain’). These datasets have some unique characteristics and problems. Therefore there is a need for methods which allow modelling in spite of the uniqueness of the datasets, capable of dealing with missing data, allow integrating data from various sources, explicitly indicate statistical dependence and independence and allow modelling with uncertainties. These requirements have given rise to an influx of new methods, especially from the fields of machine learning and probabilistic graphical models. In particular, Bayesian Networks (BNs), which are a type of graphical network model with directed links that offer a general and versatile approach to capturing and reasoning with uncertainty. In this chapter some background mathematics/statistics, description and relevant aspects of building the networks are given to better understand s and appreciate BN’s potential. There are also brief discussions of their applications, the unique value and the challenges of this modelling technique for the domain. As will be seen in this chapter, with the additional advantages the BNs can offer, it is not surprising that it is becoming an increasingly popular modelling tool in the health domain.
Chapter Preview
Top

Data Mining In The Health Domain

As information systems are becoming more commonplace, healthcare routinely generates large clinical and administrative datasets in the process of patient care (Bates et al., 1999; Lee & Abbott, 2003; Nadathur, 2009). The collected information includes patients’ history, diagnostic, therapeutic and interventions, regarding care facilities, occupancy, costs, claims and reimbursements, etc (Nadathur, 2009). Clinical trials, electronic patient records and computer supported disease management increasingly produce large quantities of clinical data (Becker et al., 1998; Hoey & Soehl, 1997; Matchar & Samsa, 1999; Pronovost & Kazandjian, 1999; Van der Lei, 2002).

Data generation capabilities in the Health Domain are growing faster than data analysis capabilities. Gigabyte-sized data sets are not uncommon. Two examples are the collection of functional magnetic resonance imaging (MRI) data describing brain activity (T. M. Mitchell, 1999) and the Australian Health Insurance Commission (HIC) datasets (Viveros, Nearhos, & Rothman, 1996). HIC has collected detailed claims information for the Australian population. The on-line claims file alone is said to be over 550 gigabytes containing five years of history (Viveros et al., 1996). Terabyte-sized datasets also exist. For example, a high-power microscope can rapidly obtain a 10-30 gigabytes image from a tissue sample. Thus, multiple images from a subject in a longitudinal study or a study across multiple layers of tissue can reach hundreds of gigabytes or terabytes (Kumar et al., 2008). Petabyte-sized data sets are on the way.

With the steady increase in electronic capture there has been a trend towards not only more extensive but also integrated information systems in healthcare (Bates et al., 1999; Nadathur, 2009; Staccini, Joubert, Quaranta, Fieschi, & Fieschi, 2001). There has been increasing ease of collecting data including over the networks. Linkages of clinical, administrative and external datasets are not uncommon (Stone, Ramsden, Howard, Roberts, & Halliday, 2002; Sundararajan, Henderson, Ackland, & Marshall, 2002; Williams et al., 2006). Such record linkages of routinely collected data has the potential to inform policy (Nadathur, 2009).

The trend towards increased electronic capture and data integration goes hand-in-hand with augmented efforts to standardise the capture, increasing obligation and willingness to collect quality data. This is especially seen in the recording of diagnoses; for example, the development of the International Classification of Diseases (ICD) (Hasan, Meara, & Bhowmick, 1995; Kugler, Freytag, Stillger, Bauer, & Ferbert, 2000; Stühlinger, Hogl, Stoyan, & Müller, 2000) which is the basis for refining the casemix funding found in many countries. There is also increased standardisation of nursing terminologies used to document diagnoses, interventions, outcomes, and goals in electronic systems (Lee & Abbott, 2003). Nowadays there are more stringent data collection requirements and standardisations in the form of state and national level data dictionaries (Anderson, 1986; Linnarsson & Wigertz, 1989; Moss, 1995).

With increasing availability of comprehensive data in health databases, data mining is growing in popularity. Data mining tools can go beyond mere description of data, and provide knowledge in the form of testable models and prediction of systems. Some of these analysis use techniques from machine learning.

Complete Chapter List

Search this Book:
Reset