Prediction of Compound-Protein Interactions with Machine Learning Methods

Prediction of Compound-Protein Interactions with Machine Learning Methods

Yoshihiro Yamanishi (Mines ParisTech, Institut Curie, Inserm U900, France) and Hisashi Kashima (IBM Tokyo Research Laboratory, Japan)
DOI: 10.4018/978-1-61520-911-8.ch016
OnDemand PDF Download:
No Current Special Offers


In silico prediction of compound-protein interactions from heterogeneous biological data is critical in the process of drug development. In this chapter the authors review several supervised machine learning methods to predict unknown compound-protein interactions from chemical structure and genomic sequence information simultaneously. The authors review several kernel-based algorithms from two different viewpoints: binary classification and dimension reduction. In the results, they demonstrate the usefulness of the methods on the prediction of drug-target interactions and ligand-protein interactions from chemical structure data and genomic sequence data.
Chapter Preview


A variety of computational approaches have been developed to analyze and predict compound-protein interactions. One of the most commonly used is docking simulations (Rarey, Kramer, Lengauer, & Klebe, 1996, Cheng et al., 2007). However, the docking cannot be applied to proteins whose 3D structures are unknown, so this limitation is serious for membrane proteins. For example, there are only two GPCRs with 3D structure information (bovine rhodopsin and human β2-adrenergic receptor) as of writing. Therefore it is difficult to use the docking simulations on a large scale. Another unique approach is text mining which are usually based on key-word searching in a huge number of literatures (Zhu, Okuno, Tsujimoto, & Mamitsuka, 2005), but it suffers from an inability to detect new biological findings and the problem of redundancy in the compound names and protein names in the literature. Recently, a classification of target proteins based on their ligand structures has been performed (Keiser et al., 2007) and an analysis of the drug-target network has revealed characteristic features of its network topology (Yildirim, Goh, Cusick, Barabasi, & Vidal, 2007). However, neither protein sequence information nor chemical structure information were taken into consideration simultaneously.

Complete Chapter List

Search this Book: