Data Mining of Chemogenomics Data Using Activity Landscape and Partial Least Squares

Data Mining of Chemogenomics Data Using Activity Landscape and Partial Least Squares

Kiyoshi Hasegawa (Chugai Pharmaceutical Company, Kamakura Research Laboratories, Japan) and Kimito Funatsu (University of Tokyo, Japan)
DOI: 10.4018/978-1-4666-5888-2.ch165
OnDemand PDF Download:
List Price: $37.50

Chapter Preview



In this study, we combined AL and partial least squares (PLS) for analyzing the aminergic G protein-coupled receptor (GPCR) data set. PLS is a statistical method that bears some relation to principal components regression. It finds a linear regression model by projecting the predicted variables and the observable variables to a new space. Each AL was created from the inhibitory activity values of molecules against each GPCR (Peltason, 2010, p. 1021). After assembling all ALs, the inhibitory activity values in ALs were correlated with the sequence data of GPCRs by PLS (Hasegawa, 2012, p. 766). AL for new GPCR could be estimated from the established PLS models. We successfully predicted the inhibitory activity values for the external molecules not included in the training data set.

Key Terms in this Chapter

Partial Least Squares: A recent statistical technique that generalizes and combines features from principal component analysis and multiple regression.

Activity Landscape: A representation of integrate pairwise compound similarity and potency relationships to provide direct access to characteristic structure–activity relationship features in compound data sets.

G Protein-Coupled Receptors: A group of seven transmembrane proteins which bind signal molecules outside the cell, transduct the signal into the cell and finally cause a cellular response.

Data Mining: A technique of data analysis such as statistics, pattern recognition, the artificial intelligence to a large quantity of data for getting knowledge.

Chemogenomics: A hyphened research field to understand a whole human system by searching for active compounds acting on protein families on a genome scale comprehensively.

Complete Chapter List

Search this Book: