Fuzzy Cluster Analysis of Larger Data Sets

Fuzzy Cluster Analysis of Larger Data Sets

Roland Winkler (German Aerospace Center Braunschweig, Germany), Frank Klawonn (University of Applied Sciences Braunschweig/Wolfenbüttel, Germany), Frank Höppner (University of Applied Sciences Braunschweig/Wolfenbüttel, Germany) and Rudolf Kruse (Otto-von-Guericke University Magdeburg, Germany)
DOI: 10.4018/978-1-60566-858-1.ch012
OnDemand PDF Download:


The application of fuzzy cluster analysis to larger data sets can cause runtime and memory overflow problems. While deterministic or hard clustering assigns a data object to a unique cluster, fuzzy clustering distributes the membership of a data object over different clusters. In standard fuzzy clustering, membership degrees will (almost) never become zero, so that all data objects are assigned to - even with very small membership degrees - all clusters. As a consequence, this does not only demand higher computational and memory power, it also leads to the undesired effect that all data objects will always influence all clusters, no matter how far away they are from a cluster. New approaches, modifying the idea of the fuzzifier, have been developed to avoid the problem of nonzero membership degrees for all data and clusters. In this chapter, these ideas will be combined with concepts of speeding up fuzzy clustering by a suitable data organization, so that fuzzy clustering can be applied more efficiently to larger data sets.
Chapter Preview

This work is related to two major fields of fuzzy clustering. In the first field, the concern is to increase the clustering quality or to adapt FcM to a specific problem because FcM does not generate the desired results. The first approach by Ruspini (1969) of Fuzzy c-Means only considered a fuzzifier value of 2. This approach was extended by Dunn (1973) to an adjustable value which influences the softness of the fuzzy approach. Later, several approaches were made to change the behaviour of FcM by changing the fuzzifier function i.e. (Klawonn & Höppner, 2003a; Klawonn & Höppner, 2003b).

The second large field this work is related to is the concern how to apply an FcM algorithm on very large data sets especially if only limited calculation resources are available. In the past, this was a much more important issue than it is today. For this work, we consider that the data set can be loaded fully into the local memory of the computer which provides random access to the data. Our main concern will be to adapt FcM in a way that reduces the runtime of the algorithm.

Complete Chapter List

Search this Book: