1 / 15

Developing Sequence Alignment Algorithms in C++

Developing Sequence Alignment Algorithms in C++. Dr. Nancy Warter-Perez May 21, 2002. Outline. Hand out project Group assignments References for sequence alignment algorithms Board example of Needleman-Wunch

lore
Download Presentation

Developing Sequence Alignment Algorithms in C++

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002

  2. Outline • Hand out project • Group assignments • References for sequence alignment algorithms • Board example of Needleman-Wunch • Discussion of LCS Algorithm and how it can be extended for global alignment (Smith-Waterman) • Extensions: local alignment and gap penalties Developing Sequence Alignment Algorithms in C++

  3. Project Group Members • Group 1: • Bonnie, Eduardo, Sara • Group 2: • Thi, Edain • Group 3: • Michael, Hardik, Daisy • Group 4: • Dennis, Ivonne, Patrick • Group 5: • Chuck, Ronny Developing Sequence Alignment Algorithms in C++

  4. Project References • http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignments.html • http://www.sbc.su.se/~per/molbioinfo2001/dynprog/dynamic.html • Lectures: Database search (4/16) and Rationale for DB Searching (5/16) • Computational Molecular Biology – An Algorithmic Approach, Pavel Pevzner • Introduction to Computational Biology – Maps, sequences, and genomes, Michael Waterman • Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology, Dan Gusfield Developing Sequence Alignment Algorithms in C++

  5. Classic Papers • Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970.(http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/needlemanandwunsch1970.pdf) • Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981.(http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/smithandwaterman1981.pdf) • Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp. 701-707, 1990. (http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/smith1990.pdf) Developing Sequence Alignment Algorithms in C++

  6. Longest Common Subsequence (LCS) Problem • Can have insertion and deletions but no substitutions • Ex: V: ATCTGAT W: TGCATA LCS: TCTA Developing Sequence Alignment Algorithms in C++

  7. LCS Problem (cont.) • Similarity score si-1,j si,j = max { si,j-1 si-1,j-1 + 1, if vi = wj Developing Sequence Alignment Algorithms in C++

  8. Indels – insertions and deletions (e.g., gaps) • alignment is V and W • Alignment A is a 2xl matrix (l >= n,m) • First row of A contains characters of V interspersed with l-n spaces • Second row of A contains characters of W interspersed with l-m spaces • Space in first row = insertion  (UP) • Space in second row = deletion  (LEFT) • Match (no mismatch in LCS) (DIAG) Developing Sequence Alignment Algorithms in C++

  9. LCS(V,W) Algorithm for i = 1 to n si,0 = 0 for j = 1 to n s0,j = 0 for i = 1 to n for j = 1 to m if vi = wj si,j = si-1,j-1 + 1; bi,j = DIAG else if si-1,j >= si,j-1 si,j = si-1,j; bi,j = UP else si,j = si,j-1; bi,j = LEFT Developing Sequence Alignment Algorithms in C++

  10. Print-LCS(b,V,i,j) if i = 0 or j = 0 return if bi,j = DIAG PRINT-LCS(b, V, i-1, j-1) print vi else if bi,j = UP PRINT-LCS(b, V, i-1, j) else PRINT-LCS(b, V, I, j-1) Developing Sequence Alignment Algorithms in C++

  11. Extend LCS to Global Alignment si-1,j + (vi, -) si,j = max { si,j-1 + (-, wj) si-1,j-1 + (vi, wj) (vi, -) = (-, wj) = - = extend gap penalty (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM • Modify LCS and PRINT-LCS algorithms to support global alignment (On board discussion) Developing Sequence Alignment Algorithms in C++

  12. Extend to Local Alignment 0 (no negative scores) si-1,j + (vi, -) si,j = max { si,j-1 + (-, wj) si-1,j-1 + (vi, wj) (vi, -) = (-, wj) = - = extend gap penalty (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM Developing Sequence Alignment Algorithms in C++

  13. Discussion on adding affine gap penalties • Affine gap penalty • Score for a gap of length x -( + x) • Where •  > 0 is the insert gap penalty •  > 0 is the extend gap penalty • On board example fromhttp://www.sbc.su.se/~arne/kurser/swell/pairwise_alignments.html Developing Sequence Alignment Algorithms in C++

  14. Alignment with Gap PenaltiesCan apply to global or local (w/ zero) algorithms si,j = max { si-1,j -  si-1,j - ( + ) si,j = max { si1,j-1 -  si,j-1 - ( + ) si-1,j-1 + (vi, wj) si,j = max { si,j si,j Developing Sequence Alignment Algorithms in C++

  15. Implementing Global Alignment Program in C++ • Keeping it simple (e.g., without classes or structures) • Score matrix • Traceback matrix • Simple algorithm: • Read in two sequences • Compute score and traceback matrices (modified LCS) • Print alignment score = score[n][m] • Print each aligned sequence (modified PRINT-LCS) using traceback • For debugging – can also print the score and traceback matrices Developing Sequence Alignment Algorithms in C++

More Related