Graph Mining in Chemoinformatics

Graph Mining in Chemoinformatics

Hiroto Saigo, Koji Tsuda
ISBN13: 9781615209118|ISBN10: 1615209115|ISBN13 Softcover: 9781616923686|EISBN13: 9781615209125
DOI: 10.4018/978-1-61520-911-8.ch006
Cite Chapter Cite Chapter

MLA

Saigo, Hiroto, and Koji Tsuda. "Graph Mining in Chemoinformatics." Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques, edited by Huma Lodhi and Yoshihiro Yamanishi, IGI Global, 2011, pp. 95-128. https://doi.org/10.4018/978-1-61520-911-8.ch006

APA

Saigo, H. & Tsuda, K. (2011). Graph Mining in Chemoinformatics. In H. Lodhi & Y. Yamanishi (Eds.), Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques (pp. 95-128). IGI Global. https://doi.org/10.4018/978-1-61520-911-8.ch006

Chicago

Saigo, Hiroto, and Koji Tsuda. "Graph Mining in Chemoinformatics." In Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques, edited by Huma Lodhi and Yoshihiro Yamanishi, 95-128. Hershey, PA: IGI Global, 2011. https://doi.org/10.4018/978-1-61520-911-8.ch006

Export Reference

Mendeley
Favorite

Abstract

In standard QSAR (Quantitative Structure Activity Relationship) approaches, chemical compounds are represented as a set of physicochemical property descriptors, which are then used as numerical features for classification or regression. However, standard descriptors such as structural keys and fingerprints are not comprehensive enough in many cases. Since chemical compounds are naturally represented as attributed graphs, graph mining techniques allow us to create subgraph patterns (i.e., structural motifs) that can be used as additional descriptors. In this chapter, the authors present theoretically motivated QSAR algorithms that can automatically identify informative subgraph patterns. A graph mining subroutine is embedded in the mother algorithm and it is called repeatedly to collect patterns progressively. The authors present three variations that build on support vector machines (SVM), partial least squares regression (PLS) and least angle regression (LARS). In comparison to graph kernels, our methods are more interpretable, thereby allows chemists to identify salient subgraph features to improve the druglikeliness of lead compounds.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.