Protein Motif Comparator Using Bio-Inspired Two-Way K-Means

Protein Motif Comparator Using Bio-Inspired Two-Way K-Means

R. Gowri (Periyar University, India) and R. Rathipriya (Periyar University, India)
Copyright: © 2018 |Pages: 23
DOI: 10.4018/978-1-5225-4151-6.ch004
OnDemand PDF Download:
No Current Special Offers


In this scientific world, the evolution of the disease is predominantly higher than the medicines. The diagnosis and prognosis of such diseases will differ from patient to patient. In this scenario, the protein motifs are very useful for understanding the functionality and lethality of the disease. Most of the existing techniques are supervised approaches which require prior knowledge of the data. As the protein sequences are unsupervised data, the unsupervised data mining techniques like Clustering and 2-way Clustering are chosen to mine the homologous protein motifs. The quality of the results is refined further using the bio-inspired computing models like Particle Swarm Optimization, Genetic Algorithm and Venus Flytrap Optimization in this research work. The existing approaches can mine homologous patterns with structure similarity of 75 percent which is increased in this proposed approach. The results from these three different approaches show that the bio-inspired based 2-way Clustering approaches can mine more homologous motifs than the clustering approaches.
Chapter Preview


Proteins (Y.Vincent, Bernard, & SinanKockara) (Structures of life, 2007) are present in every cell of the organisms. They are involved virtually almost in all cellular activities. They are responsible for the various metabolic activities, nutrition transportation, regulations and etc. They exist as single chain molecule, as a three-dimensional structures or even in the bundle or complex forms. The protein plays a vital role in cellular processes. The protein consists of twenty amino acids. They possess different characteristics such as hydrophobic, hydrophilic, polar, non-polar, etc. It is the great challenge to the bioinformatics researchers that to find which combination of proteins are responsible for what kind of activities. The structure and function discovery of proteins in living organisms is vital role in understanding the background of various cellular processes. It is helpful in treating various diseases and in detecting the drugs to peculiar diseases.

The purpose of this motif comparator is to detect the motif information (Zhaoa-Xing-Ming, 2005) (Kunik.V, Solan.Z, Edelman.S, Ruppin.E, & Horn.D, 2005) from the protein sequences by clustering and 2-way clustering them. K-means is the benchmark clustering technique, used to group similar kind of data elements (Bapuji Rao, 2017). It is used to discover similar patterns from vast amount of data, a toy example is shown in the figure 1. It is widely used in many research areas like bioinformatics, pattern recognition, statistics, image analysis and machine learning. As all these areas are dealing with the unclassified data, the clustering is well suited to these kinds of research areas. The clusters can be found based on various similarities among the data such as intra distance and inter distance of the clusters (R.Gowri & R.Rathipriya, 2016-c) (Duggirala Raja Kishor, 2016). The quality of clusters will be evaluated based on our objective.

Figure 1.

Clustering Approach: A toy example


2-way clustering (Madeira.S.C & OliveiraA.L., 2004) is one of the data mining techniques. It is also named as co-clustering (R.Gowri & R.Rathipriya, 2017), biclustering. Biclustering and 2-way clustering are used synonymously in this chapter. This approach generates biclusters of different sizes and characteristics as shown in the figure 2. The process of grouping data based on both the samples and attributes. K-means is used for 2-way clustering in the proposed approach. K-means is applied to both rows and columns simultaneously and local patterns (Biclusters) are extracted by combining these row and column clusters.

Figure 2.

2-way Clustering: A toy example


The major difference (Madeira.S.C & OliveiraA.L., 2004) (Berkhin, 2002) between the clustering and 2-way clustering are as follows.

Complete Chapter List

Search this Book: