Structure –Activity Relationships by Autocorrelation Descriptors and Genetic Algorithms

Structure –Activity Relationships by Autocorrelation Descriptors and Genetic Algorithms

Viviana Consonni (University of Milano-Bicocca, Italy) and Roberto Todeschini (University of Milano-Bicocca, Italy)
DOI: 10.4018/978-1-61520-911-8.ch005
OnDemand PDF Download:
List Price: $37.50


Quantitative Structure-Activity Relationships (QSARs) are models relating variation of molecule properties, such as biological activities, to variation of some structural features of chemical compounds. Three main topics take part of the QSAR/QSPR approach to the scientific research: the representation of molecular structure, the definition of molecular descriptors and the chemoinformatics tools. Molecular descriptors are numerical indices encoding some information related to the molecular structure. They can be both experimental physico-chemical properties of molecules and theoretical indices calculated by mathematical formulas or computational algorithms. In the last few decades, much interest has been addressed to studying how to encompass and convert the information encoded in the molecular structure into one or more numbers used to establish quantitative relationships between structures and properties, biological activities or other experimental properties. Autocorrelation descriptors are a class of molecular descriptors based on the statistical concept of spatial autocorrelation applied to the molecular structure. The objective of this chapter is to investigate the chemical information encompassed by autocorrelation descriptors and elucidate their role in QSAR and drug design. After a short introduction to molecular descriptors from a historical point of view, the chapter will focus on reviewing the different types of autocorrelation descriptors proposed in the literature so far. Then, some methodological topics related to multivariate data analysis will be overviewed paying particular attention to analysis of similarity/diversity of chemical spaces and feature selection for multiple linear regressions. The last part of the chapter will deal with application of autocorrelation descriptors to study similarity relationships of a set of flavonoids and establish QSARs for predicting affinity constants, Ki, to the GABAA benzodiazepine receptor site, BzR.
Chapter Preview


Quantitative Structure–Activity Relationships (QSARs) are the final result of the process which starts with a suitable description of molecular structures and ends with some inference, hypothesis, prediction on the behaviour of molecules in environmental, biological, and physico-chemical systems in analysis. They are models playing a relevant role in chemistry, pharmaceutical sciences, environmental protection policy, toxicology, ecotoxicology, health research and quality control.

QSARs are based on the assumption that the structure of a molecule (for example, its geometric, steric and electronic properties) must contain the features responsible for its physical, chemical, and biological properties and on the ability to capture these features into one or more numerical descriptors. According to the congenericity principle, similar compounds have similar activities and activity changes gently in the chemical space.

By QSAR models, the biological activity (or property, reactivity, etc.) of a new designed or untested chemical can be inferred from the molecular structure of similar compounds whose activities (properties, reactivity, etc.) have already been assessed.

It has been nearly 45 years since the QSAR modelling firstly was used into the practice of agrochemistry, drug design, toxicology, industrial and environmental chemistry. Its growing power in the following years may be mainly attributed to the rapid and extensive development in methodologies and computational techniques that have allowed to delineate and refine the many variables and approaches used to model molecular properties (Martin, 1979; Kubinyi, 1993; Hansch & Leo, 1995; van de Waterbeemd, Testa, & Folkers, 1997; Devillers, 1998; Kubinyi, Folkers, & Martin, 1998; Martin, 1998; Charton & Charton, 2002; Gasteiger, 2003; Oprea, 2004). Furthermore, the interest in QSARs is more and more growing because nowadays these tools are not only used for research purposes but also to produce data on chemicals in the interest of time and cost effectiveness.

The development of QSAR/QSPR models is a quite complex process (Figure 1). Important stages of this process are a) selection of the set of molecules the modelling procedure is applied to, and the set of molecular descriptors which will define the model chemical space; b) selection of the training set for the model estimation and the test set for model validation; c) application of the validated model(s) to design new molecules with desirable properties and /or predict the properties of future molecules.

Figure 1.

Role of models in QSAR/QSPR framework

In recent years, “The use of information technology and management has become a critical part of the drug discovery process. Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and organization.” (Brown, 1998). In fact, chemoinformatics encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information (Gasteiger, 2003; Oprea, 2003); molecular descriptors play a fundamental role in all these processes being the basic tool to transform chemical information into a numerical code necessary to apply informatics procedures.

Molecular descriptors are formally mathematical representations of a molecule obtained by a well specified algorithm applied to a defined molecular representation or a well specified experimental procedure: the molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment (Todeschini & Consonni, 2000, p. 303).

Complete Chapter List

Search this Book: