A Bayesian Framework for Improving Clustering Accuracy of Protein Sequences Based on Association Rules
Peng-Yeng Yin (National Chi Nan University, Taiwan), Shyong-Jian Shyu (Ming Chuan University, Taiwan), Guan-Shieng Huang (National Chi Nan University, Taiwan) and Shuang-Te Liao (Ming Chuan University, Taiwan)
Copyright: © 2009
With the advent of new sequencing technology for biological data, the number of sequenced proteins stored in public databases has become an explosion. The structural, functional, and phylogenetic analyses of proteins would benefit from exploring databases by using data mining techniques. Clustering algorithms can assign proteins into clusters such that proteins in the same cluster are more similar in homology than those in different clusters. This procedure not only simplifies the analysis task but also enhances the accuracy of the results. Most of the existing protein-clustering algorithms compute the similarity between proteins based on one-to-one pairwise sequence alignment instead of multiple sequences alignment; the latter is prohibited due to expensive computation. Hence the accuracy of the clustering result is deteriorated. Further, the traditional clustering methods are ad-hoc and the resulting clustering often converges to local optima. This chapter presents a Bayesian framework for improving clustering accuracy of protein sequences based on association rules. The experimental results manifest that the proposed framework can significantly improve the performance of traditional clustering methods.