A Greedy Clustering Algorithm for Multiple Sequence Alignment

A Greedy Clustering Algorithm for Multiple Sequence Alignment

Rabah Lebsir, Abdesslem Layeb, Tahi Fariza
DOI: 10.4018/IJCINI.20211001.oa41
Article PDF Download
Open access articles are freely available for download


This paper presents a strategy to tackle the Multiple Sequence Alignment (MSA) problem, which is one of the most important tasks in the biological sequence analysis. Its role is to align the sequences in their entirety to derive relationships and common characteristics between a set of protein or nucleotide sequences. The MSA problem was proved to be an NP-Hard problem. The proposed strategy incorporates a new idea based on the well-known divide and conquer paradigm. This paper presents a novel method of clustering sequences as a preliminary step to improve the final alignment; this decomposition can be used as an optimization procedure with any MSA aligner to explore promising alignments of the search space. In their solution, authors proposed to align the clusters in a parallel and distributed way in order to benefit from parallel architectures. The strategy was tested using classical benchmarks like BAliBASE, Sabre, Prefab4 and Oxm, and the experimental results show that it gives good results by comparing to the other aligners.
Article Preview

Msa Problem Formulation

In this section, the formulation of MSA problem is given to show its combinatorial nature. Let IJCINI.20211001.oa41.m01 a set of n sequences withIJCINI.20211001.oa41.m02. Each sequence IJCINI.20211001.oa41.m03 is a string well-defined over an alphabet. The lengths of the sequences are not necessarily the same and in most of time are different. The MSA problem can be well-defined by identifying a pair IJCINI.20211001.oa41.m04) where IJCINI.20211001.oa41.m05 is the set of all potential alignments and IJCINI.20211001.oa41.m06 define a function IJCINI.20211001.oa41.m07 called the alignment score. Each possible alignment is seen as a set IJCINI.20211001.oa41.m08 satisfying the following criteria:

Complete Article List

Search this Journal:
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 1 Issue (2022)
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing