Guided Sequence Alignment

Abdullah N. Arslan

Source Title: Encyclopedia of Data Warehousing and Mining, Second Edition

ISBN13: 9781605660103|ISBN10: 1605660108|EISBN13: 9781605660110

DOI: 10.4018/978-1-60566-010-3.ch149

MLA

Arslan, Abdullah N. "Guided Sequence Alignment." Encyclopedia of Data Warehousing and Mining, Second Edition, edited by John Wang, IGI Global, 2009, pp. 964-969. https://doi.org/10.4018/978-1-60566-010-3.ch149

APA

Arslan, A. N. (2009). Guided Sequence Alignment. In J. Wang (Ed.), Encyclopedia of Data Warehousing and Mining, Second Edition (pp. 964-969). IGI Global. https://doi.org/10.4018/978-1-60566-010-3.ch149

Chicago

Arslan, Abdullah N. "Guided Sequence Alignment." In Encyclopedia of Data Warehousing and Mining, Second Edition, edited by John Wang, 964-969. Hershey, PA: IGI Global, 2009. https://doi.org/10.4018/978-1-60566-010-3.ch149

Export Reference

Favorite

View Full Text HTML

View Full Text PDF

Abstract

Sequence alignment is one of the most fundamental problems in computational biology. Ordinarily, the problem aims to align symbols of given sequences in a way to optimize similarity score. This score is computed using a given scoring matrix that assigns a score to every pair of symbols in an alignment. The expectation is that scoring matrices perform well for alignments of all sequences. However, it has been shown that this is not always true although scoring matrices are derived from known similarities. Biological sequences share common sequence structures that are signatures of common functions, or evolutionary relatedness. The alignment process should be guided by constraining the desired alignments to contain these structures even though this does not always yield optimal scores. Changes in biological sequences occur over the course of millions of years, and in ways, and orders we do not completely know. Sequence alignment has become a dynamic area where new knowledge is acquired, new common structures are extracted from sequences, and these yield more sophisticated alignment methods, which in turn yield more knowledge. This feedback loop is essential for this inherently difficult task. The ordinary definition of sequence alignment does not always reveal biologically accurate similarities. To overcome this, there have been attempts that redefined sequence similarity. Huang (1994) proposed an optimization problem in which close matches are rewarded more favorably than the same number of isolated matches. Zhang, Berman & Miller (1998) proposed an algorithm that finds alignments free of low scoring regions. Arslan, Egecioglu, & Pevzner (2001) proposed length-normalized local sequence alignment for which the objective is to find subsequences that yield maximum length-normalized score where the length-normalized score of a given alignment is its score divided by sum of subsequence-lengths involved in the alignment. This can be considered as a contextdependent sequence alignment where a high degree of local similarity defines a context. Arslan, Egecioglu, & Pevzner (2001) presented a fractional programming algorithm for the resulting problem. Although these attempts are important, some biologically meaningful alignments can contain motifs whose inclusions are not guaranteed in the alignments returned by these methods. Our emphasis in this chapter is on methods that guide sequence alignment by requiring desired alignments to contain given common structures identified in sequences (motifs).

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.

Username or email: *

Password: *

Forgot individual login password?

Create individual account

Guided Sequence Alignment

MLA

APA

Chicago

Export Reference

Abstract

Request Access