Supporting Binding-Sites Discovery via Iterative Database Processing

Supporting Binding-Sites Discovery via Iterative Database Processing

Ran Tel-Nir (Technology and Information Systems Program, Tel Aviv University, Tel Aviv, Israel), Roy Gelbard (Information Systems Program, Bar-Ilan University, Ramat-Gan, Israel) and Israel Spiegler (Technology and Information Systems Program, Tel Aviv University, Tel Aviv, Israel)
DOI: 10.4018/ijsbbt.2013070102
OnDemand PDF Download:


Recognition of binding sites common to a set of protein structures is important for applications such as drug design. Common methods of binding-sites are based on heuristic algorithms that use summarized spatial data and superimposition techniques. However, computational operations generally do not store intermediate data for further calculation and information extraction. The current study presents an alternative approach to binding calculation by introducing a binary representation scheme for three dimensional molecule data and a fast iterative algorithm which obviates the need to calculate and resolve spatial transformations in the binding site extraction process. This is achieved by using relational database indexing methods and an efficient iterative model. This general-purpose iterative algorithm was tested for binding small molecules. The results show that the method can be applied efficiently for binding site extraction, and bio-information extraction. This binary representation improves performance by reducing processing time by 31% compared to typical representations.
Article Preview


The extraction of a potential binding site is a major step in solving drug design problems. Matching polyhedra that represent the spatial coordinates of the pseudo-center representation of molecules, is an important stage in the binding site search process. Possible matches for high order polyhedra (i.e. matches of six points in space or more) can consume hundreds of gigabytes of memory. The number of comparisons used for the extraction of matched polyhedra exceeds the limits of current computational power (the problem is NP hard) or computer memory, when done in real time. To overcome this problem, heuristic algorithms have been developed to extract matched higher order polyhedrons from matched triangles using complex algorithms based on manipulating the appropriate spatial transformations (Shatsky et al. 2006). Among these are the geometric hashing method developed by Wolfson (Wolfson & Lamdan 1988; Wolfson 1990).

These algorithms are based on computational operations done directly in the computer memory which generally does not store intermediate results for further calculation or information extraction.

Relational databases, however, may make it possible to store intermediate results of large data calculations for further use, and manipulate complex computations which use data stored in large databases. This paves the way for extracting possible matched polyhedra by the direct creation of high order polyhedra (n points in space for example). This is done by calculating all the permutations of selecting n points out of m points (where m is the size of the pseudo-centered representation of a molecule). When comparing two molecules for binding site extraction in the direct method we compare all the permutations of selecting n points out of m with the combinations of selecting n points out of m' (where m' is the size of the pseudo-centered representation of the compared molecule).

However, when calculating all the possible permutations and combinations of possible matched polyhedra of order n equals six, a huge number of more than 100 Gigabytes of data storage is required for the match extraction process and more than 1014 comparisons are needed. Thus, the extraction of matched higher order polyhedra is unrealistic through direct polyhedron creation methods. Database methods can extract possible matched polyhedra using large data storage, but an efficient algorithm is also needed to reduce the number of calculations and the size of the intermediate stored data.

The goal of this study is two-fold: (1) design and construct an alternative model for extraction of higher order polyhedra that is not memory bound, and (2) construct a new representation model for bio-data that can be manipulated by standard database technology and thus compatible with data mining methods. The model introduced here implements an iterative algorithm that is based on the creation of possible matched higher order polyhedra from the previously extracted matched polyhedra of lower orders. When using this model of emerging solutions the number of possible matched polyhedra is dependent on the number of extracted matched polyhedra of lower order, which is coherent with the physico-chemical properties of the molecules examined. Moreover, this number is independent of the polyhedron order. The findings show that this iterative recognition model along with the use of advanced binary representation of the bio-data is more efficient than the direct recognition model. The iterative recognition model also extracts better matches and higher polyhedron orders than the direct method. The model and system works faster, performs on a par with existing models and can be a valuable tool in structural bioinformatics research.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 5: 2 Issues (2017): Forthcoming, Available for Pre-Order
Volume 4: 2 Issues (2016): Forthcoming, Available for Pre-Order
Volume 3: 1 Issue (2015)
Volume 2: 4 Issues (2013)
Volume 1: 4 Issues (2012)
View Complete Journal Contents Listing