1 / 20

Global Alignment Summary

Global Alignment Summary. If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + g align A[i] with GAP Score(i, j-1) + g align B[j] with GAP Score( i , j) = max Score(i-1, j-1) + m if A[i] == B[j ]

walden
Download Presentation

Global Alignment Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Global Alignment Summary • If Score(i, j)denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + g align A[i] with GAP Score(i, j-1) + g align B[j] with GAP Score(i, j) = max Score(i-1, j-1) + m if A[i] == B[j] Score(i-1, j-1) + s if A[i] <> B[j] Score(i, 0) = i * g Score(j, 0) = j * g • Identifying the actual alignment is done by tracing back the pointers starting at lower-right corner

  2. Global Alignment Algorithm To compute GLOBAL ALIGNMENT given two sequences: 1. create a matrix with rows, cols equal to the lengths of the two sequences, respectively # initialize the cells of row 0 and column 0 only 2. for each column c, set cell(0, c) to i*gap 3. for each row r, set cell(r, 0) to i*gap 4. for each row in the matrix starting at 1: 5. for each col in the matrix starting at 1: 6. calculate option1, option2, option3 7. set the current cell to the largest value of option1, option2, option3 8. return the Matrix (or highest score)

  3. Global Alignment Example • Align CACTAG and GATTACA using g = -2, s = -1, m = 2

  4. Semi-Global Alignment • Motivation CAGCACTTGGATTCTCGG (global alignment) CAGC––––G––T––––GG CAGCA-CTTGGATTCTCGG (semi-global alignment) –––CAGCGTGG–––––––– • Second alignment may be preferable despite the lower score • Modify the algorithm so that terminal gaps are not penalized (i.e. gaps at both ends)

  5. Semi-Global Alignment • Modify the algorithm so that terminal gaps are not penalized

  6. Semi-Global Alignment Summary • If Score(i, j)denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + g align A[i] with GAP Score(i, j-1) + g align B[j] with GAP Score(i, j) = max Score(i-1, j-1) + m if A[i] == B[j] Score(i-1, j-1) + s if A[i] <> B[j] Score(i, 0) = 0 Score(j, 0) = 0 Gap cost g is set to 0 for last row and last column • Identifying actual alignment same as global alignment

  7. Semi-Global Alignment Algorithm To compute SEMI-GLOBAL ALIGNMENT given two sequences: 1. create a matrix with rows, cols equal to the lengths of the two sequences, respectively # initialize to 0 the cells of row 0 and column 0 2. for each column c, set cell(0, c) to 0 (no gap pen.) 3. for each row r, set cell(r, 0) to 0 (no gap penalty) 4. for each row in the matrix starting at 1: 5. for each col in the matrix starting at 1: 6. calculate option1, option2, option3 using gap penalty of 0 for last row and for last columns 7. set the current cell to the largest value of option1, option2, option3 8. return the Matrix (or highest score)

  8. Semi-Global Alignment Example • Align GACTATGA andATTAusing g = -2, s = -1, m = 2

  9. Local Alignment • Goal is to find two substrings (common regions) from the two sequences that have the highest global alignment score AAAACCCCCGGGGTTA TTCCCGGGAACCAACC • Similar to previous two methods, but stops extending the current sub-alignment until its score becomes negative

  10. Local Alignment • Modify the algorithm to identify high score common fragment

  11. Local Alignment • Align GACTATGA and ATTA using g = -2, s = -1, m = 2

  12. Local Alignment • Align GACTATGA and ATTA using g = -2, s = -1, m = 2

  13. Local Alignment • Align GACTATGA and ATTA using g = -2, s = -1, m = 2

  14. Local Alignment Example T C CCC T G G A A C C A A C C ------------------------------------------------- |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| A|0 0 0 0 0 0 0 0 0 2 2 0 0 2 2 0 0| A|0 0 0 0 0 0 0 0 0 2 4 2 0 2 4 2 0| A|0 0 0 0 0 0 0 0 0 2 4 3 1 2 4 3 1| A|0 0 0 0 0 0 0 0 0 2 4 3 2 3 4 3 2| C|0 0 2 2 2 2 0 0 0 0 2 6 5 3 2 6 5| C|0 0 2 4 4 4 2 0 0 0 0 4 8 6 4 4 8| C|0 0 2 4 6 6 4 2 0 0 0 2 6 7 5 6 6| C|0 0 2 4 6 8 6 4 2 0 0 2 4 5 6 7 8| C|0 0 2 4 6 8 7 5 3 1 0 2 4 3 4 8 9| G|0 0 0 2 4 6 7 9 7 5 3 1 2 3 2 6 7| G|0 0 0 0 2 4 5 9 11 9 7 5 3 1 2 4 5| G|0 0 0 0 0 2 3 7 11 10 8 6 4 2 0 2 3| G|0 0 0 0 0 0 1 5 9 10 9 7 5 3 1 0 1| T|0 2 0 0 0 0 2 3 7 8 9 8 6 4 2 0 0| T|0 2 1 0 0 0 2 1 5 6 7 8 7 5 3 1 0| A|0 0 1 0 0 0 0 1 3 7 8 6 7 9 7 5 3|

  15. Local Alignment Summary • If Score(i, j)denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + g align A[i] with GAP Score(i, j-1) + g align B[j] with GAP Score(i, j) = max Score(i-1, j-1) + m if A[i] == B[j] Score(i-1, j-1) + s if A[i] <> B[j] 0 Score(i, 0) = 0 Score(j, 0) = 0 Gap cost g is set to 0 for last row and last column • Recovering Alignment: Find the entry with highest value anywhere in the matrix and use that as the starting point for tracing back until a 0 is found

  16. Local Alignment Algorithm To compute LOCAL ALIGNMENT given two sequences: 1. create a matrix with rows, cols equal to the lengths of the two sequences, respectively # initialize to 0 the cells of row 0 and column 0 2. for each column c, set cell(0, c) to 0 3. for each row r, set cell(r, 0) to 0 4. for each row in the matrix starting at 1: 5. for each col in the matrix starting at 1: 6. calculate option1, option2, option3 using gap penalty of 0 for last row and for last columns 7. set the current cell to the largest value of 0, option1, option2, option3 8. return the Matrix (or highest score)

  17. global alignment Needleman SB, Wunsch CD. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". J Mol Biol48 (3): 443-53. semiglobal alignment local alignment Smith TF, Waterman MS (1981). "Identification of Common Molecular Subsequences". J Mol Biol147: 195–197 Images from from UMN CS5481

  18. Gap Penalty Revisited • So far used uniform gap penalty, i.e. k gaps = k*g penalty • Another possibility is to use two types of gap penalty • gap opening penalty (go) – for starting a gapped region • gap extension penalty (ge) – for continuing a gap region • typically gap opening penalty set higher (biased against gaps) and gap extension penalty is lower (once gap region started, ok to extend) • Gap penalty G for k gaps now becomes G(k) = go + (k-1)*ge (also called affine gap penalty)

  19. Affine Gap Penalty • Modify the algorithm to support gap open/extension penalty

  20. Global Alignment, Affine Gap • If Score(i, j)denotes best score to aligning A[1 : i] and B[1 : j] Score(i-k, j) + G(k) 1 ≤ k ≤ i Score(i, j-k) + G(k) 1 ≤ k ≤ i Score(i, j) = max Score(i-1, j-1) + m if A[i] == B[j] Score(i-1, j-1) + s if A[i] <> B[j] Score(i, 0) = G(i) Score(j, 0) = G(j) • Horizontally and Vertically now need to try all cells for possible source of gap opening

More Related