A Methodology for Clustering Transient Biomedical Signals by Variable

A Methodology for Clustering Transient Biomedical Signals by Variable

Pimwadee Chaovalit (National Electronics and Computer Technology Center, Thailand)
DOI: 10.4018/jcmam.2012010103


Biomedical signals which help monitor patients’ physical conditions are a crucial part of the healthcare industry. The healthcare professionals’ ability to monitor patients and detect early signs of conditions such as blocked arteries and abnormal heart rhythms can be accomplished by performing data clustering on biomedical signals. More importantly, clustering on streams of biomedical signals make it possible to look for patterns that may indicate developing conditions. While there are a number of clustering algorithms that perform data streams clustering by example, few algorithms exist that perform clustering by variable. This paper presents POD-Clus, a clustering method which uses a model-based clustering principle and, in addition to clustering by example, also cluster data streams by variable. The clustering result from POD-Clus was superior to the result from ODAC, a baseline algorithm, for both with and without cluster evolutions.
Article Preview


The healthcare industry contains massive amounts of biomedical signal data, for example, ECG (Electrocardiogram), EEG (Electroencephalogram), and PCG (Phonocardiogram) signals. These signals possess some common characteristics shared with data streams definition. They are collected at a fast pace. For example, brain rhythms collected from a Brain-Computer Interface (BCI) can range from 8-12 Hz in mu rhythms and 13-30 Hz in beta rhythms (Bashashati, Fatourechi, Ward, & Birch, 2007). The observation values collected from streams of signals are potentially “unbounded” (Lu & Huang, 2005), which means that the ending in their observation values is uncertain. For instance, EEG data can be recorded for days, for they are used to diagnose epilepsy or sleep patterns (Sun & Sclabassi, 1999).

Signals collected at a fast rate such as those from a digital ECG (an automatic heart monitoring device) can be challenging to process. Since an ECG can have up to 10 wires to collect a heart’s electrical signal (Cowley, 2006) and these heart’s electrical data are continuously gathered at the rate of 100 – 1,000 samples per second (Bragge et al., 2004), a 2-minute recording of ECG can produce 10 signal streams (from 10 wires), each with 120,000 data samples, leading to a total of 1.2 million data points. This is an enormous amount of data collected within a short period of time. As various heart conditions can be revealed from the analysis of heartbeats. For example, a heart-related condition such as blocked blood supply will generate tissue death and reflect in the abnormal height of heartbeat waves. Therefore, analyzing these large amounts of data in a timely manner for a quick diagnosis is challenging, as the data may become too large to either deliver over a network (Sun & Sclabassi, 1999) or store in the main memory of the device. For this reason, data streams processing needs to happen real-time while data streams arrive.

As various types of biomedical signals can be considered data streams, there exists a need for effective data streams mining techniques that can handle such data streams efficiently. Data streams’ characteristics (Domingos & Hulten, 2000; Gama, Rodrigues, & Aguilar-Ruiz, 2007) can be described as follows:

  •  Data from the streams usually come in at a detailed level, e.g., 1000 Hz.

  •  Streaming data arrives at a fast pace, therefore agile data management and utilization is key.

  •  Observations of data are potentially unbounded.

  •  Storage and memory resources for processing data streams are possibly limited.

Data streams clustering can be incorporated into a computer-aided analysis used by physicians to cluster biomedical signals for diagnosis on the patients. By grouping biomedical signals into homogeneous clusters, we learn about data characteristics which may indicate developing conditions. Results from clustering can then be developed into classification models or predictive models useful in healthcare diagnoses. As Chaovalit (2010) has proposed the POD-Clus algorithm (Probability and Distribution-based Clustering) for data streams clustering, the algorithm was for only its ability to cluster data streams by example. This paper focuses on the ability of POD-Clus to cluster by variable. Clustering by variable is another clustering perspective for data streams. POD-Cus shows a significant improvement on clustering results compared to the baseline competing algorithm from the same category.


Let us review criteria for a capable data streams clustering method. In literature, employing the “incremental” approach will allow data miners to process high-volume and high-speed data streams in a small amount at a time, thus avoiding an expensive processing of potentially large-sized data at the end of the streams (Babcock, Babu, Datar, Motwani, & Widom, 2002; Barbará, 2002; Domingos & Hulten, 2001; Golab & Özsu, 2003). The incremental approach has the following details:

Complete Article List

Search this Journal:
Volume 4: 2 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing