Heuristics in Medical Data Mining

Heuristics in Medical Data Mining

Susan E. George (University of South Australia, Australia)
DOI: 10.4018/978-1-60566-026-4.ch270
OnDemand PDF Download:
$37.50

Abstract

Deriving—or discovering—information from data has come to be known as data mining. Within health care, the knowledge from medical mining has been used in tasks as diverse as patient diagnosis (Brameier et al., 2000; Mani et al., 1999; Cao et al., 1998; Henson et al., 1996), inventory stock control (Bansal et al., 2000), and intelligent interfaces for patient record systems (George at al., 2000). It has also been a tool of medical discovery itself (Steven et al., 1996). Yet, it remains true that medicine is one of the last areas of society to be “automated,” with a relatively recent increase in the volume of electronic data, many paper-based clinical record systems in use, a lack of standardisation (for example, among coding schemes), and still some reluctance among health-care providers to use computer technology. Nevertheless, the rapidly increasing volume of electronic medical data is perhaps one of the domain’s current distinguishing characteristics, as one of the last components of society to be “automated.” Data mining presents many challenges, as “knowledge” is automatically extracted from data sets, especially when data are complex in nature, with many hundreds of variables and relationships among those variables that vary in time, space, or both, often with a measure of uncertainty, as is common within medicine. Cios and Moore (2001) identified a number of unique features of medical data mining, including the use of imaging and need for visualisation techniques, the large amounts of unstructured nature of free text within records, data ownership and the distributed nature of data, the legal implications for medical providers, the privacy and security concerns of patients requiring anonymous data used, where possible, together with the difficulty in making a mathematical characterisation of the domain. Strictly speaking, many ventures within medical data mining are better described as exercises in “machine learning,” where the main issues are, for example, discovering the complexity of relationships among data items, or making predictions in light of uncertainty, rather than “data mining,” in large, possibly distributed, volumes of data that are also highly complex. Large data sets mean not only increased algorithmic complexity but also often the need to employ special-purpose methods to isolate trends and extract “knowledge” from data. However, medical data frequently provide just such a combination of vast (often distributed) complex data sets.
Chapter Preview
Top

Historical Perspective

Deriving—or discovering—information from data has come to be known as data mining. Within health care, the knowledge from medical mining has been used in tasks as diverse as patient diagnosis (Brameier et al., 2000; Mani et al., 1999; Cao et al., 1998; Henson et al., 1996), inventory stock control (Bansal et al., 2000), and intelligent interfaces for patient record systems (George at al., 2000). It has also been a tool of medical discovery itself (Steven et al., 1996). Yet, it remains true that medicine is one of the last areas of society to be “automated,” with a relatively recent increase in the volume of electronic data, many paper-based clinical record systems in use, a lack of standardisation (for example, among coding schemes), and still some reluctance among health-care providers to use computer technology. Nevertheless, the rapidly increasing volume of electronic medical data is perhaps one of the domain’s current distinguishing characteristics, as one of the last components of society to be “automated.”

Data mining presents many challenges, as “knowledge” is automatically extracted from data sets, especially when data are complex in nature, with many hundreds of variables and relationships among those variables that vary in time, space, or both, often with a measure of uncertainty, as is common within medicine. Cios and Moore (2001) identified a number of unique features of medical data mining, including the use of imaging and need for visualisation techniques, the large amounts of unstructured nature of free text within records, data ownership and the distributed nature of data, the legal implications for medical providers, the privacy and security concerns of patients requiring anonymous data used, where possible, together with the difficulty in making a mathematical characterisation of the domain.

Strictly speaking, many ventures within medical data mining are better described as exercises in “machine learning,” where the main issues are, for example, discovering the complexity of relationships among data items, or making predictions in light of uncertainty, rather than “data mining,” in large, possibly distributed, volumes of data that are also highly complex. Large data sets mean not only increased algorithmic complexity but also often the need to employ special-purpose methods to isolate trends and extract “knowledge” from data. However, medical data frequently provide just such a combination of vast (often distributed) complex data sets.

Heuristic methods are one way in which the vastness, complexity, and uncertainty of data may be addressed in the mining process. A heuristic is something that aids discovery of a solution. Artificial intelligence (AI) popularised the heuristic as something that captures, in a computational way, the knowledge that people use to solve everyday problems. AI has a classic graph search algorithm known as A* (Hart et al., 1968), which is a heuristic search (under the right conditions). Increasingly, heuristics refer to techniques that are inspired by nature, biology, and physics. The genetic search algorithm (Holland, 1975) may be regarded as a heuristic technique. More recent population-based approaches have been demonstrated in the Memetic Algorithm (Moscato, 1989), and specific modifications of such heuristic methods in a medical mining context can be noted (Brameier et al., 2000).

Key Terms in this Chapter

Heuristic: From the Greek “heuriskein,” meaning “to discover.” A heuristic aids discovery, particularly the search for solutions in domains that are difficult and poorly understood. It is commonly known as a “rule of thumb.” Unlike algorithms, heuristics do not guarantee optimal or even feasible solutions and frequently do not have a theoretical guarantee.

Data Mining: Analysis of data using methods that look for patterns in the data, frequently operating without knowledge of the meaning of the data. Typically, the term is applied to exploration of large-scale databases in contrast to machine-learning methods that are applied to smaller data sets.

Backward-Looking Responsibility: When backward-looking, we seek to discover who is to blame in wake of a harmful event. There are frequently connotations of punishment, legal intervention, and determination of guilt.

This work was previously published in Encyclopedia of Information Science and Technology: edited by M. Khosrow-Pour, pp. 1322-1326, copyright 2005 by Information Science Reference, formerly known as Idea Group Reference (an imprint of IGI Global)

Complete Chapter List

Search this Book:
Reset