150 likes | 282 Views
Developing Sequence Alignment Algorithms in C++. Dr. Nancy Warter-Perez May 21, 2002. Outline. Hand out project Group assignments References for sequence alignment algorithms Board example of Needleman-Wunch
E N D
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002
Outline • Hand out project • Group assignments • References for sequence alignment algorithms • Board example of Needleman-Wunch • Discussion of LCS Algorithm and how it can be extended for global alignment (Smith-Waterman) • Extensions: local alignment and gap penalties Developing Sequence Alignment Algorithms in C++
Project Group Members • Group 1: • Bonnie, Eduardo, Sara • Group 2: • Thi, Edain • Group 3: • Michael, Hardik, Daisy • Group 4: • Dennis, Ivonne, Patrick • Group 5: • Chuck, Ronny Developing Sequence Alignment Algorithms in C++
Project References • http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignments.html • http://www.sbc.su.se/~per/molbioinfo2001/dynprog/dynamic.html • Lectures: Database search (4/16) and Rationale for DB Searching (5/16) • Computational Molecular Biology – An Algorithmic Approach, Pavel Pevzner • Introduction to Computational Biology – Maps, sequences, and genomes, Michael Waterman • Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology, Dan Gusfield Developing Sequence Alignment Algorithms in C++
Classic Papers • Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970.(http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/needlemanandwunsch1970.pdf) • Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981.(http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/smithandwaterman1981.pdf) • Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp. 701-707, 1990. (http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/smith1990.pdf) Developing Sequence Alignment Algorithms in C++
Longest Common Subsequence (LCS) Problem • Can have insertion and deletions but no substitutions • Ex: V: ATCTGAT W: TGCATA LCS: TCTA Developing Sequence Alignment Algorithms in C++
LCS Problem (cont.) • Similarity score si-1,j si,j = max { si,j-1 si-1,j-1 + 1, if vi = wj Developing Sequence Alignment Algorithms in C++
Indels – insertions and deletions (e.g., gaps) • alignment is V and W • Alignment A is a 2xl matrix (l >= n,m) • First row of A contains characters of V interspersed with l-n spaces • Second row of A contains characters of W interspersed with l-m spaces • Space in first row = insertion (UP) • Space in second row = deletion (LEFT) • Match (no mismatch in LCS) (DIAG) Developing Sequence Alignment Algorithms in C++
LCS(V,W) Algorithm for i = 1 to n si,0 = 0 for j = 1 to n s0,j = 0 for i = 1 to n for j = 1 to m if vi = wj si,j = si-1,j-1 + 1; bi,j = DIAG else if si-1,j >= si,j-1 si,j = si-1,j; bi,j = UP else si,j = si,j-1; bi,j = LEFT Developing Sequence Alignment Algorithms in C++
Print-LCS(b,V,i,j) if i = 0 or j = 0 return if bi,j = DIAG PRINT-LCS(b, V, i-1, j-1) print vi else if bi,j = UP PRINT-LCS(b, V, i-1, j) else PRINT-LCS(b, V, I, j-1) Developing Sequence Alignment Algorithms in C++
Extend LCS to Global Alignment si-1,j + (vi, -) si,j = max { si,j-1 + (-, wj) si-1,j-1 + (vi, wj) (vi, -) = (-, wj) = - = extend gap penalty (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM • Modify LCS and PRINT-LCS algorithms to support global alignment (On board discussion) Developing Sequence Alignment Algorithms in C++
Extend to Local Alignment 0 (no negative scores) si-1,j + (vi, -) si,j = max { si,j-1 + (-, wj) si-1,j-1 + (vi, wj) (vi, -) = (-, wj) = - = extend gap penalty (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM Developing Sequence Alignment Algorithms in C++
Discussion on adding affine gap penalties • Affine gap penalty • Score for a gap of length x -( + x) • Where • > 0 is the insert gap penalty • > 0 is the extend gap penalty • On board example fromhttp://www.sbc.su.se/~arne/kurser/swell/pairwise_alignments.html Developing Sequence Alignment Algorithms in C++
Alignment with Gap PenaltiesCan apply to global or local (w/ zero) algorithms si,j = max { si-1,j - si-1,j - ( + ) si,j = max { si1,j-1 - si,j-1 - ( + ) si-1,j-1 + (vi, wj) si,j = max { si,j si,j Developing Sequence Alignment Algorithms in C++
Implementing Global Alignment Program in C++ • Keeping it simple (e.g., without classes or structures) • Score matrix • Traceback matrix • Simple algorithm: • Read in two sequences • Compute score and traceback matrices (modified LCS) • Print alignment score = score[n][m] • Print each aligned sequence (modified PRINT-LCS) using traceback • For debugging – can also print the score and traceback matrices Developing Sequence Alignment Algorithms in C++