Article Preview
Top1. Introduction
Clustering, so-called set partitioning, is a basic methodology widely applied in the
fields including statistics, mathematical programming (e.g., location selecting,
network partitioning, routing, scheduling and assignment problems) and computer
science (e.g., pattern recognition, learning theory, image processing and computer
graphics). The core logic of clustering is to classify all objects into several
mutually exclusive groups so as to achieve some optimal conditions. Most
conventional clustering methods rapidly become computationally intractable as the
problem scale gets larger due to the combinatorial nature of the methods. The
mathematical model of clustering for a given number (K) of clusters
can be formulated as:
[CA]
(1)Subject to:
for all i(2)
(3)
for all i, j(4)
for all i, j(5) where
xij denotes the relationship among
objects.
xij=1 indicates that object
i
is assigned to object
j, (also cluster
j).
xij=0 otherwise;
i,
j={1,2,…,N},
N is the number of objects,
K is the number of clusters. Accordingly,
xjj=1 represents that object
j
is assigned to itself, indicating that cluster
j has been formed.
F(X) is the objective function.
X is the
vector of
xij for all
i, j. Equation (2) represents that each object
must be assigned to one and only one object (cluster). Equation (3) represents that a total of exactly
K clusters will be formed. Equation (4) represents that the object
i can
only be assigned to the object which is also assigned to itself,
i.e. the corresponding cluster has been formed.
The total number of feasible solutions of [CA] is:
. That is, the solution space of [CA] is exponential to the problem
size and number of clusters. For instance, there are a total of 511 feasible
solutions for 10 objects (N=10) to be divided into 2 clusters
(K=2). There are a total of 42,525 feasible solutions for 10
objects to be divided into 5 clusters. Brucker
(1978) and Welch (1983) proved
that, for specific objective functions, clustering becomes an NP-hard problem when
the number of clusters exceeds three, if one aims to find the optimal clusters.
Hansen and Jaumard (1987) pointed out
that even though the best algorithms developed for some specific objective
functions, there would exhibit complexities of O(N3logN) or
O(N3), leaving some room for further improvement.