The Self-Organizing Map (Kohonen, 1997) is an effective and a very popular tool for data clustering and visualization. With this method, the input samples are projected into a low dimension space while preserving their topology. The samples are described by a set of features. The input space is generally a high dimensional space Rd. 2D or 3D maps are very often used for visualization in a low dimension space (2 or 3). For many applications, usually in psychology, biology, genetic, image and signal processing, such vector description is not available; only pair-wise dissimilarity data is provided. For instance, applications in Text Mining or ADN exploration are very important in this field and the observations are usually described through their proximities expressed by the “Levenshtein”, or “String Edit” distances (Levenshtein, 1966). The first approach consists of the transformation of a dissimilarity matrix into a true Euclidean distance matrix. A straightforward strategy is to use “Multidimensional Scaling” techniques (Borg & Groenen, 1997) to provide a feature space. So, the initial vector SOM algorithm can be naturally used. If this transformation involves great distortions, the initial vector model for SOM is no longer valid, and the analysis of dissimilarity data requires specific techniques (Jain & Dubes, 1988; Van Cutsem, 1994) and Dissimilarity Self Organizing Map (DSOM) is a new one. Consequently, adaptation of the Self-Organizing Map (SOM) to dissimilarity data is of a growing interest. During this last decade, different propositions emerged to extend the vector SOM model to pair-wise dissimilarity data. The main motivation is to cope with large proximity databases for data mining. In this article, we present a new adaptation of the SOM algorithm which is compared with two existing ones.
Adaptation Of Som For Dissimilarity Data
This article presents a new DSOM algorithm for dissimilarity data. We will first present DSOM algorithms which have been directly derived from the initial SOM framework. In the next parts, we will present in detail our proposed algorithm and some experiments to show its effectiveness in comparison with the other DSOM algorithms.
Key Terms in this Chapter
SOM Batch Algorithm: A version of SOM in which at an iteration all observations are available and used for computation.
Prototype: Referent of a node (neuron) on the map.
Self-Organizing Map (SOM): A subtype of artificial neural networks. It is trained using unsupervised learning to produce low dimensional representation of the training samples while preserving the topological properties of the input space.
Quantization Error: Error which appears when an observation is represented by a prototype.
Affectation Step: A part of the learning iteration where an observation is affected to the nearest prototype according to a predefined distance.
Topology Preservation: Preservation of the neighbourhood relation of the observations in the output space. It means that the observations which are neighbours in the input space should be projected in neighbour nodes.
Dissimilarity SOM: A SOM where all observations are described by a dissimilarity matrix.
Dissimilarity Data: Data in which all we know about the observations are pair-wise dissimilarities.
Representation Step: A part of the learning iteration where the prototype is adapted to well represent its affected observations.