1 / 22

LCS and Extensions to Global and Local Alignment

LCS and Extensions to Global and Local Alignment. Dr. Nancy Warter-Perez June 26, 2003. Overview. Recursion Recursive solution to hydrophobicity sliding window problem LCS Smith-Waterman Algorithm Extensions to LCS Global Alignment Local Alignment Affine Gap Penalties

anevay
Download Presentation

LCS and Extensions to Global and Local Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003

  2. Overview • Recursion • Recursive solution to hydrophobicity sliding window problem • LCS • Smith-Waterman Algorithm • Extensions to LCS • Global Alignment • Local Alignment • Affine Gap Penalties • Programming Workshop 6 LCS and Extensions

  3. Project References • http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignments.html • Computational Molecular Biology – An Algorithmic Approach, Pavel Pevzner • Introduction to Computational Biology – Maps, sequences, and genomes, Michael Waterman • Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology, Dan Gusfield LCS and Extensions

  4. Recursion • Problems can be solved iteratively or recursively • Recursion is useful in cases where you are building upon a partial solution • Consider the hydrophobicity problem LCS and Extensions

  5. Main.cpp #include <iostream> #include <string> using namespace std; #include "hydro.h" double hydro[25] = {1.8,0,2.5,-3.5,-3.5,2.8,-0.4,-3.2,4.5,0,-3.9,3.8,1.9,-3.5,0, -1.6,-3.5,-4.5,-0.8,-0.7,0,4.2,-0.9,0,-1.3}; void main () { string seq; int ws, i; cout << "This program will compute the hydrophobicity of an sequence of amino acids.\n"; cout << "Please enter the sequence: "<< flush; cin >> seq; for(i = 0; i < seq.size(); i++) if((seq.data()[i] >= 'a') && (seq.data()[i] <= 'z')) seq.at(i) = seq.data()[i] - 32; cout << "Please enter the window size: "<< flush; cin >> ws; compute_hydro(seq, ws); } LCS and Extensions

  6. Hydro.cpp #include <iostream> #include <string> using namespace std; #include "hydro.h" void print_hydro(string seq, int ws, int i, double sum); void compute_hydro(string seq, int ws) { cout << "\n\nThe hydrophocity values are:" << endl; print_hydro(seq, ws, seq.size()-1, 0); } void print_hydro(string seq, int ws, int i, double sum) { if(i == -1) return; if(i > seq.size() - ws) sum += hydro[seq.data()[i] - 'A']; else sum = sum - hydro[seq.data()[i+ws] - 'A'] + hydro[seq.data()[i] - 'A']; print_hydro(seq, ws, i-1, sum); if (i <= seq.size() - ws) cout << "Hydrophocity value:\t" << sum/ws << endl; } LCS and Extensions

  7. hydro.h extern double hydro[25]; void compute_hydro(string seq, int ws); LCS and Extensions

  8. Dynamic Programming • Applied to optimization problems • Useful when • Problem can be recursively divided into sub-problems • Sub-problems are not independent LCS and Extensions

  9. Longest Common Subsequence (LCS) Problem • Reference: Pevzner • Can have insertion and deletions but no substitutions (no mismatches) • Ex: V: ATCTGAT W: TGCATA LCS: TCTA LCS and Extensions

  10. LCS Problem (cont.) • Similarity score si-1,j si,j = max { si,j-1 si-1,j-1 + 1, if vi = wj • On board example: Pevzner Fig 6.1 LCS and Extensions

  11. Indels – insertions and deletions (e.g., gaps) • alignment of V and W • V = rows of similarity matrix (vertical axis) • W = columns of similarity matrix (horizontal axis) • Space (gap) in W  (UP) • insertion • Space (gap) in V  (LEFT) • deletion • Match (no mismatch in LCS) (DIAG) LCS and Extensions

  12. LCS(V,W) Algorithm for i = 0 to n si,0 = 0 for j = 1 to m s0,j = 0 for i = 1 to n for j = 1 to m if vi = wj si,j = si-1,j-1 + 1; bi,j = DIAG else if si-1,j >= si,j-1 si,j = si-1,j; bi,j = UP else si,j = si,j-1; bi,j = LEFT LCS and Extensions

  13. Print-LCS(V,i,j) if i = 0 or j = 0 return if bi,j = DIAG PRINT-LCS(V, i-1, j-1) print vi else if bi,j = UP PRINT-LCS(V, i-1, j) else PRINT-LCS(V, I, j-1) LCS and Extensions

  14. Classic Papers • Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970.(http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/needlemanandwunsch1970.pdf) • Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981.(http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/smithandwaterman1981.pdf) • Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp. 701-707, 1990. (http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArticlesArchive/smith1990.pdf) LCS and Extensions

  15. Smith-Waterman (1 of 3) Algorithm The twomolecular sequences will be A=a1a2. . . an, and B=b1b2. . . bm. A similarity s(a,b) isgiven between sequence elements a and b. Deletions oflength k are given weight Wk. To find pairs of segments with high degrees ofsimilarity, we set up amatrix H . First set Hk0 = Hol= 0 for 0 <= k <= nand 0 <= l <= m. Preliminary values ofH have the interpretation that H i jis the maximum similarity of twosegments ending in aiandbj. respectively. These values are obtained from the relationship Hij=max{Hi-1,j-1+ s(ai,bj), max {Hi-k,j – Wk}, max{Hi,j-l - Wl }, 0}( 1 ) k >= 1 l >= 1 1 <= i <= n and 1 <= j <= m. LCS and Extensions

  16. Smith-Waterman (2 of 3) • The formula for Hijfollows byconsidering the possibilities forending the segments at any ai and bj. • If aiand bj are associated, the similarity is • Hi-l,j-l + s(ai,bj). • (2) If aiis at the end of a deletion of length k, the similarity is • Hi – k, j - Wk . • (3) If bjis at the end of a deletion of length 1, the similarity is • Hi,j-l - Wl. (typo in paper) • (4) Finally, a zero is included to prevent calculated negative similarity, indicating no similarity up toai and bj. LCS and Extensions

  17. Smith-Waterman (3 of 3) The pair of segments with maximum similarity is found by first locating the maximum element of H. The other matrix elements leading to this maximum value are than sequentially determined with a traceback procedure ending with an element of H equal to zero. This procedure identifies the segments as well as produces the corresponding alignment. The pair of segments with the next best similarity is found by applying the traceback procedure tothe second largest element of H not associated with the first traceback. LCS and Extensions

  18. Extend LCS to Global Alignment si-1,j + (vi, -) si,j = max { si,j-1 + (-, wj) si-1,j-1 + (vi, wj) (vi, -) = (-, wj) = - = fixed gap penalty (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM LCS and Extensions

  19. Extend to Local Alignment 0 (no negative scores) si-1,j + (vi, -) si,j = max { si,j-1 + (-, wj) si-1,j-1 + (vi, wj) (vi, -) = (-, wj) = - = fixed gap penalty (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM LCS and Extensions

  20. Discussion on adding affine gap penalties • Affine gap penalty • Score for a gap of length x -( + x) • Where •  > 0 is the insert gap penalty •  > 0 is the extend gap penalty LCS and Extensions

  21. Alignment with Gap PenaltiesCan apply to global or local (w/ zero) algorithms si,j = max { si-1,j -  si-1,j - ( + ) si,j = max { si1,j-1 -  si,j-1 - ( + ) si-1,j-1 + (vi, wj) si,j = max { si,j si,j Note: keeping with traversal order in Figure 6.1,  is replaced by , and  is replaced by  LCS and Extensions

  22. Programming Workshop 6 • Implement LCS • LCS(V,W) • b and s are global matrices • Print-LCS(V,i,j) • Write a program that uses LCS and Print-LCS. The program should prompt the user for 2 sequences and print the longest common sequence. LCS and Extensions

More Related