Article Preview
TopSimilarity
Functions of similarity are used in many fields, in particular in Data Analysis, Form Recognitions, Symbolic Machine Learning, and Cognitive Sciences.
In a general way, a function of similarity is defined in a universe U that can be modelled using a quadruplet: (Ld, Ls, T, FS).
- •
Ld is the language of representation used to describe the data.
- •
Ls is the language of representation of the similarities.
- •
T is a set of knowledge that we possess on the studied universe.
- •
FS is the binary function of similarity, such as: FS: Ld x Ld → Ls
When, the function of similarity has for object to quantify the resemblances between the data, the Ls language corresponds to the set of the values in the interval [0...1] or in the R+ set and we will speak then of similarity measurement (Bisson, 2000).
Most works concerning the similarity measures have as base the mathematical concept of distance (the inverse notion of similarity) which was well studied in DA (Mahé & Vert, 2007; Bisson, 2000).
It is defined in the following way: let Ω the set of the individuals of the studied domain a metric D which is a function of Ω X Ω in R+, ∀a, b, c∈ Ω.
- 1)
D(a, a) = 0 (property of minimality)
- 2)
D(a, b) = D (b, a) (property of symmetry)
When the function D verifies the properties 1 and 2, it is called index of dissimilarity (or more simply a dissimilarity).
The other properties are also interesting:
- 3)
D(a, b) = 0 ⇒ a = b (property of identity)
- 4)
D(a, c) ≤ D(a, b) +D(b, c) (triangular inequality)
- 5)
D(a, c) ≤ Max [D(a, b),D(b, c)]