Molecular Similarity: Combining Neural Networks and Knn Methods

Molecular Similarity: Combining Neural Networks and Knn Methods

Abdelmalek Amine (Tahar Moulay University & Djillali Liabes University, Algeria), Zakaria Elberrichi (Djillali Liabes University, Algeria), Michel Simonet (Joseph Fourier University, France) and Ali Rahmouni (Tahar Moulay University, Algeria)
DOI: 10.4018/978-1-60960-860-6.ch005


In order to identify new molecules susceptible to become medicines, the pharmaceutical research has more and more resort to new technologies to synthesize big number of molecules simultaneously and to test their actions on given therapeutic target. This data can be exploited to construct the models permitting to predict the properties of molecules not yet tested, even not yet synthesized. Such predictive models are very important because they make it possible to suggest the synthesis of new molecules, and to eliminate very early in the the molecule’s search process the molecules whose properties would prevent their use as medicine. The authors call it virtual sifting. It is within this framework that research by similarity is registered. It is a practical approach to identify molecules candidates (to become medicines) from the data bases or the virtual chemical libraries by comparing the compounds two by two. Many statistical models and learning tools have been developed to correlate the molecule’s structure with their chemical, physical or biological properties. The large majority of these methods start by transforming each molecule in a vector of great dimension (using molecular descriptors), then use a learning algorithm on these vectorial descriptions. The objective of this chapter is to study molecular similarity using a particular type of neural networks: the Kohonen networks (also called “SOM” Self- Organizing Maps), applying the nearest neighbor algorithm to the projection of the molecules (coordinates) in the constructed MAP.
Chapter Preview


Functions of similarity are used in many fields, in particular in Data Analysis, Form Recognitions, Symbolic Machine Learning, and Cognitive Sciences.

In a general way, a function of similarity is defined in a universe U that can be modelled using a quadruplet: (Ld, Ls, T, FS).

  • Ld is the language of representation used to describe the data.

  • Ls is the language of representation of the similarities.

  • T is a set of knowledge that we possess on the studied universe.

  • FS is the binary function of similarity, such as: FS: Ld x Ld → Ls

When, the function of similarity has for object to quantify the resemblances between the data, the Ls language corresponds to the set of the values in the interval [0...1] or in the R+ set and we will speak then of similarity measurement (Bisson, 2000).

Most works concerning the similarity measures have as base the mathematical concept of distance (the inverse notion of similarity) which was well studied in DA (Mahé & Vert, 2007; Bisson, 2000).

It is defined in the following way: let Ω the set of the individuals of the studied domain a metric D which is a function of Ω X Ω in R+, ∀a, b, c∈ Ω.

  • 1.

    D(a, a) = 0 (property of minimality)

  • 2.

    D(a, b) = D (b, a) (property of symmetry)

Complete Chapter List

Search this Book: