Protein Motif Comparator using PSO K-Means

Protein Motif Comparator using PSO K-Means

Gowri R. (Department of Computer Science, Periyar University, Salem, India) and Rathipriya R. (Department of Computer Science, Periyar University, Salem, India)
Copyright: © 2016 |Pages: 13
DOI: 10.4018/IJAMC.2016070104


The main goal of this paper is to compare the motif information extracted from clusters and biclusters of the protein using Motif Comparator. The clusters and biclusters are obtained using the PSO k-means algorithm. The functions of the proteins are preferably found from their motif information. The Motif Comparator is used to detect the clusters and biclusters, to locate the Significant Amino Acids present, to find the highly homologous cluster. The motif information acquired is based on the structure homogeneity of the protein sequence. The homogeneity is evaluated based on their secondary structure similarity of the protein.
Article Preview


Proteins (Vincent, Bernard, & Sinan Kockara) are present in every cell of the organisms. They are involved virtually in almost all cell activities. They are responsible for the various metabolic activities, nutrition transportation, regulations, etc... The protein plays a vital role in cellular processes. The protein consists of twenty amino acids. They possess different characteristics. It is great challenge to the bioinformatics that to find which combination of proteins perform what type of activities. The motifs are helpful to express the functionality of the protein based on their significant amino acids. The functionality of proteins is discovered by various methods like sequence-motif based method, homology based methods, and structure based methods and so on. The sequence motif based method is used in the current work. Finding such motifs manually are so tedious to perform, these patterns are extracted based on either Structure or Functional characteristic of protein. As the protein is in the form of lengthy string of sequential combination amino acids, the frequency profiles of the protein are used for this work. The motifs can be extracted from the clusters that are generated by various computational techniques.

The purpose of this motif comparator is to detect the motif information (Zhao, 2005) from the protein sequences. Clustering is one of the known data mining techniques used to group similar kind of data elements. It is used to discover similar patterns from vast amount of data. The similarity among objects in the same cluster is greater than in different clusters. It is widely used in many research areas like bioinformatics, pattern recognition, data mining, statistics, image analysis and machine learning. As all these areas are dealing with the unclassified data, the clustering is well suited to these kinds of research areas. The clusters can be found based on various similarities among the data such as intra distance and inter distance of the clusters. The quality of clusters will be evaluated based on our objective. Clustering the protein sequence will yield various patterns present in the given protein dataset. According to (Elayaraja, Thangavel, Chitralega & Chandrasekhar, 2012), the clustering technique is well suited for motif extraction from the protein sequences. They have used rough based k-means algorithm as well as support vector machine to mine the protein motifs based on their structural similarity.

Biclustering (Madeira et al., 2004) is another data mining technique. It is also named as co-clustering, two way clustering. It generates biclusters of different sizes and characteristics. The process of grouping data based on both the samples and attributes.

The major difference (Madeira et al., 2004) (Berkhin, 2002) between the clustering and biclustering are as follows:

  • Clustering applied to either rows or columns of the dataset, but biclustering is applied to both rows and columns;

  • The size of any one of the dimensions of all the clusters will be same, but biclusters are of different size;

  • Biclustering groups more similar element than the clustering process.

According to the reference (Yip, Chen & Kockara, n. d.), Biclustering can be used to discover the motifs from the frequency profiles of the protein sequence based on their structural similarity. They yield better results through biclustering technique. In order to optimize the patterns, the optimized clustering and biclustering approach is suggested in this work. Optimizing these mining approaches will yield better result than the existing methodology. The Particle Swarm Optimization (PSO) technique is one of the standard optimization techniques present in the literature. So PSO is chosen to optimize the mining approaches in this work.

Complete Article List

Search this Journal:
Open Access Articles: Forthcoming
Volume 10: 4 Issues (2019): 1 Released, 3 Forthcoming
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing