Learning Methodologies for Detection and Classification of Mutagens

Learning Methodologies for Detection and Classification of Mutagens

Huma Lodhi (Imperial College London, UK)
DOI: 10.4018/978-1-61520-911-8.ch014
OnDemand PDF Download:
No Current Special Offers


Predicting mutagenicity is a complex and challenging problem in chemoinformatics. Ames test is a biological method to assess mutagenicity of molecules. The dynamic growth in the repositories of molecules establishes a need to develop and apply effective and efficient computational techniques to solving chemoinformatics problems such as identification and classification of mutagens. Machine learning methods provide effective solutions to chemoinformatics problems. This chapter presents an overview of the learning techniques that have been developed and applied to the problem of identification and classification of mutagens.
Chapter Preview


Mutagenicity is an unfavorable characteristic of drugs that can cause adverse effects. In chemoinformatics, it is crucial to develop and design effective and efficient computational tools to identify toxic and mutagenic molecules. Accurate prediction of mutagenicity will not only accelerate the process of finding quality lead molecules but will also decrease the potential drug attrition. During recent years considerable efforts have been devoted to developing, analyzing and applying statistical and relational learning techniques to identify undesirable biological effects such as mutagenicity.

Mutagens produce mutations to DNA and may/may not cause cancers. However the use of drugs that are characterized by mutagenicity but not carcinogenicity is not recommended (Debnath, Compadre, Debnath, Schusterman, & Hansch, 1991). The Ames test (Ames, Lee, & Durston, 1973) is viewed a biological means to identify mutagenic molecules. In this test, a bacterium, generally Salmonella typhimurium, is used to categorize mutagens and non-mutagens. The novel molecules are exposed to the bacterium that lacks the ability to produce amino acid, histidine. The growth of the bacterial culture demonstrates the mutations in DNA, hence the molecule is classified mutagen. Figure 1 shows a mutagenic molecule. Machine learning methods and techniques provides an accurate, useful and efficient means to classify mutagens. In this chapter we present an overview of a number of techniques that have been developed and applied to the problem of predicting mutagenicity. The review, presented in the chapter, is not exhaustive and recent research and seminal work has been outlined.

Figure 1.

An example of mutagenic molecule



In machine learning the problem of recognition and identification of mutagens is generally solved by viewing it as a classification problems. Methods ranging from Inductive Logic Programming (ILP) techniques to kernel based methods (KMs) have been developed and applied to mutagenicity classification. Mutagenesis dataset presented by Debnath et al. (1991) is a benchmark dataset on which the efficacy of learning methods has been evaluated. We, therefore, present an overview of the techniques that have been applied to the dataset.

Mutagenesis dataset comprises 230 molecules trialled for mutagenicity on Salmonella typhimurium. Debnath et al. (1991) showed that a subset of 188 molecules are learnable using linear regression. This subset was later termed the “regression friendly” dataset (hereafter referred to as mutagenesis dataset). The remaining 42 molecules are named the “regression unfriendly” subset. Of the 188 molecules 125 have positive log mutagenicity whereas 63 molecules have zero or negative log mutagenicity. Debnath et al. identified two chemical features, C, and two structural (indicator) variables, I, to predicting mutagenicity. The chemical features are lowest unoccupied molecule orbital (LUMO) and water/octanol partition coefficient (LOGP). The two indicator variables are number of fused rings (fused rings count), IN1, and examples of acenthrylenes, IN2. These are structural binary variables where IN1 is assigned value “1” if a molecule has 3 or more fused rigs and IN1 is set to “0” for all the molecules that have less than 3 fused rings. Similarly the value of IN2 is set to 1 for 5 examples of acenthrylenes and alternatively 0. On the basis of linear regression based quantitative structure activity relation analysis, Debanth et al. suggested that mutagenicity of molecules that are aromatic nitro compounds is characterized by hydrophobicity, nitro groups in conjunction with electron attracting elements and 3 or more fused rings.

Complete Chapter List

Search this Book: