Class Distribution Curve Based Discretization With Application to Wearable Sensors and Medical Monitoring

Class Distribution Curve Based Discretization With Application to Wearable Sensors and Medical Monitoring

Nicholas Skapura (Wright State University, Dayton, USA) and Guozhu Dong (Wright State University, Dayton, USA)
DOI: 10.4018/IJMSTR.2017100102

Abstract

Understanding diseases and human activities, and constructing highly accurate classifiers are two important tasks in bio-medicine, healthcare, and wearable sensor technology. Being able to mine high-quality patterns is useful here, as such patterns can help improve understanding and build accurate classifiers. However, most pattern mining algorithms only operate on discrete data; applying them often requires a binning step to discretize continuous attributes. This article presents a new discretization technique, called Class Distribution Curve based Binning (CDC Binning); the main idea is to use a so-called class distribution curve, which measures the class purity in sliding windows over an attribute's range, to construct binning intervals. Experiments show that (1) CDC Binning outperforms existing binning methods in discovering high-quality patterns, especially when the class distribution curve is complicated (e.g. when the two classes are two fairly similar human activities), and (2) it can outperform other binning methods by 10% in classification accuracy when using discovered patterns as features. CDC Binning is particularly useful for applications where the classes/activities to be distinguished are similar to each other. This is especially important in wearable sensor technology where detection of behavioral or activity changes in a person (e.g. fall detection) could indicate a significant medical event.
Article Preview

The CDC Binning algorithm improves upon another discretization technique called Distribution Skew-based Binning (DS Binning) introduced in Skapura and Dong (2015). In that previous work, we proposed the DS Binning technique, which was also built on the class distribution curve. Importantly, CDC Binning provides significant improvements over DS Binning by simplifying the technical aspects (including the bin formation process) of the algorithm, reducing the number of parameters, and generalizing the class distribution curve to include other measures of class purity. In Skapura and Dong (2015) we also applied the DS method exclusively to EEG/EMG time series datasets; in this paper, we demonstrate that the class distribution curve concept can be applied to different types of data as well.

We now provide a high-level comparison of CDC Binning and other well- known binning methods. First, neither Equi-Width nor Equi-Density Binning uses class information in forming the bins. Moreover, while Entropy-based Binning, Fayyad and Irani (1993), uses class information, it only uses the purity information of the entirety of candidate intervals to form intervals. In contrast, CDC Binning uses the entire class distribution curve (based on the class ratios over localized sliding windows) to find optimal bin boundaries. So CDC Binning can be viewed as a generalization of Entropy-based Binning, since it makes better use of class purity.

As will be discussed later, it can be said that CDC Binning outperforms other methods when the class distribution curve is complicated, although traditional methods such as Entropy-based Binning, Fayyad and Irani (1993), gives very good performance when the class distribution curve is simple.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 5: 4 Issues (2017)
Volume 4: 4 Issues (2016)
Volume 3: 4 Issues (2015)
Volume 2: 4 Issues (2014)
Volume 1: 4 Issues (2013)
View Complete Journal Contents Listing