Cluster Analysis in Fitting Mixtures of Curves

Cluster Analysis in Fitting Mixtures of Curves

Tom Burr (Los Alamos National Laboratory, USA)
Copyright: © 2005 |Pages: 5
DOI: 10.4018/978-1-59140-557-3.ch030
OnDemand PDF Download:


One data mining activity is cluster analysis, of which there are several types. One type deserving special attention is clustering that arises due to a mixture of curves. A mixture distribution is a combination of two or more distributions. For example, a bimodal distribution could be a mix with 30% of the values generated from one unimodal distribution and 70% of the values generated from a second unimodal distribution. The special type of mixture we consider here is a mixture of curves in a two-dimensional scatter plot. Imagine a collection of hundreds or thousands of scatter plots, each containing a few hundred points, including background noise, but also containing from zero to four or five bands of points, each having a curved shape. In a recent application (Burr et al., 2001), each curved band of points was a potential thunderstorm event (see Figure 1), as observed from a distant satellite, and the goal was to cluster the points into groups associated with thunderstorm events. Each curve has its own shape, length, and location, with varying degrees of curve overlap, point density, and noise magnitude. The scatter plots of points from curves having small noise resemble a smooth curve with very little vertical variation from the curve, but there can be a wide range in noise magnitude so that some events have large vertical variation from the center of the band. In this context, each curve is a cluster and the challenge is to use only the observations to estimate how many curves comprise the mixture, plus their shapes and locations. To achieve that goal, the human eye could train a classifier by providing cluster labels to all points in example scatter plots. Each point either would belong to a curved region or to a catch-all noise category, and a specialized cluster analysis would be used to develop an approach for labeling (clustering) the points generated according to the same mechanism in future scatter plots

Complete Chapter List

Search this Book: