Fuzzy C-Means in High Dimensional Spaces

Fuzzy C-Means in High Dimensional Spaces

Roland Winkler (German Aerospace Center, Germany), Frank Klawonn (Ostfalia University, Germany) and Rudolf Kruse (Otto-von-Guericke University, Germany)
DOI: 10.4018/978-1-4666-1870-1.ch001
OnDemand PDF Download:
$30.00
List Price: $37.50

Abstract

High dimensions have a devastating effect on the FCM algorithm and similar algorithms. One effect is that the prototypes run into the centre of gravity of the entire data set. The objective function must have a local minimum in the centre of gravity that causes FCM’s behaviour. In this paper, examine this problem. This paper answers the following questions: How many dimensions are necessary to cause an ill behaviour of FCM? How does the number of prototypes influence the behaviour? Why has the objective function a local minimum in the centre of gravity? How must FCM be initialised to avoid the local minima in the centre of gravity? To understand the behaviour of the FCM algorithm and answer the above questions, the authors examine the values of the objective function and develop three test environments that consist of artificially generated data sets to provide a controlled environment. The paper concludes that FCM can only be applied successfully in high dimensions if the prototypes are initialized very close to the cluster centres.
Chapter Preview
Top

Introduction

Clustering high dimensional data has many interesting applications. For example clustering similar music files, semantic web applications, image recognition or biochemical problems. Many tools today are not designed to handle hundreds of dimensions, or in this case, it might be better to call it degrees of freedom. Many clustering approaches work quite well in low dimensions, especially the fuzzy c-means algorithm (FCM) (Dunn, 1973; Bezdek, 1981; Höppner et al., 1999; Kruse et al., 2007) seems to fail in high dimensions. Hence FCM is so useful in cases where data object arrangements that are not crisp, this paper is dedicated to give some insight into the behaviour of FCM in high dimensions.

Figure 1.

Fuzzy c Means in 2 dimensions (left) and 50 dimensions (right)

One of the problems of FCM is, that the prototypes tend to run into the centre of gravity of the complete data set, almost independently of the initialisation of the prototypes. Figure 1 shows on the right hand side a 50 dimensional artificial data set, projected on 2 dimensions, containing 100 clusters, 100 data objects each. The lines represent the way the prototypes took from their initial position. The left hand side shows FCM in 2 dimensions with 4 clusters where it works quite well. It appears that the structural problem of FCM is independent from the data set, this question is addressed later in the paper. The 4 questions presented in the abstract will be answered in this paper:

  • How many dimensions are at least necessary to cause an ill behaviour of FCM?

  • How does the number of prototypes influence ill behaviour?

  • Why has the objective function a local minima in the centre of gravity?

  • How must FCM be initialised to avoid the local minima in the centre of gravity?

These questions are independent of the actual data set FCM is applied to. The idea is to use test environments that are optimal for FCM, so that it is likely that FCM fails on all data sets if it fails on the optimal ones. Four data sets D1 through D4 are used which have different properties and are introduced Q1. is answered using D1, Q2 is answered using D2, Q3 by analysing the objective function and for Q4 the data sets D3 and D4 are used. The answers to the questions are found by observing the objective function of FCM.

Complete Chapter List

Search this Book:
Reset