1 / 12

BNFO 136 Sequence alignment

BNFO 136 Sequence alignment. Usman Roshan. Pairwise alignment. X: ACA, Y: GACAT Match=8, mismatch=2, gap-5 ACA-- -ACA- --ACA ACA---- GACAT GACAT GACAT G--ACAT 8+2+2-5-5 -5+8+8+8-5 -5-5+2+2+2 2-5-5-5-5-5-5 Score = 2 14 -4 -28. Traceback.

truda
Download Presentation

BNFO 136 Sequence alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BNFO 136Sequence alignment Usman Roshan

  2. Pairwise alignment • X: ACA, Y: GACAT • Match=8, mismatch=2, gap-5 ACA-- -ACA- --ACA ACA---- GACAT GACAT GACAT G--ACAT 8+2+2-5-5 -5+8+8+8-5 -5-5+2+2+2 2-5-5-5-5-5-5 Score = 2 14 -4 -28

  3. Traceback • We can compute an alignment of DNA (or protein or RNA) sequences X and Y with a traceback matrix T. • Sequence X is aligned along the rows and Y along the columns. • Each entry of the matrix T contains D, L, or U specifying diagonal, left or upper

  4. Traceback • X: ACA, Y=TACAG

  5. Traceback • X: ACA, Y=TACAG

  6. Traceback code aligned_seq1 = "" aligned_seq2 = "" i = len(seq2) j = len(seq1) while(i !=0 or j != 0): if(T[i][j] == “L”): aligned_seq1 = “-” + aligned_seq1 aligned_seq1 = seq1[j-1] + aligned_seq1 j = j - 1 elif(T[i][j] == "U"): aligned_seq1 = "-" + aligned_seq1 aligned_seq2 = seq2[i-1] + aligned_seq2 i = i - 1 else: aligned_seq1 = seq1[j-1] + aligned_seq1 aligned_seq2 = seq2[i-1] + aligned_seq2 i = i - 1 j = j - 1

  7. Optimal alignment • An alignment can be specified by the traceback matrix. • How do we determine the traceback for the highest scoring alignment? • Needleman-Wunsch algorithm for global alignment • First proposed in 1970 • Widely used in genomics/bioinformatics • Dynamic programming algorithm

  8. Needleman-Wunsch (NW) • Input: • X = x1x2…xn, Y=y1y2…ym • (X is seq2 and Y is seq1) • Notation: • X1..i = x1x2…xi • Score(X1..i,Y1..j) = Optimal alignment score of sequences X1..i and Y1..j. • Suppose we know the optimal alignment scores of • X1…i-1 and Y1…j-1 • X1…i and Y1...j-1 • X1...i-1 and Y1…j

  9. Needleman-Wunsch (NW) • Then the optimal alignment score of X1…i and Y1…j is the maximum of • Score(X1…i-1,Y1…j-1) + match/mismatch • Score(X1…i,Y1…j-1) + gap • Score(X1…i-1,Y1…j) + gap • We build on this observation to compute Score(Xn,Ym)

  10. Needleman-Wunsch • Define V to be a two dimensional matrix with len(X)+1 rows and len(Y)+1 columns • Let V[i][j] be the score of the optimal alignment of X1…i and Y1…j. • Let m be the match cost, mm be mismatch, and g be the gap cost.

  11. NW pseudocode Initialization: for i = 1 to len(seq2) { V[i][0] = i*g; } For i = 1 to len(seq1) { V[0][i] = i*g; } Recurrence: for i = 1 to len(seq2){ for j = 1 to len(seq1){ V[i-1][j-1] + m(or mm) V[i][j] = max { V[i-1][j] + g V[i][j-1] + g if(maximum is V[i-1][j-1] + m(or mm)) then T[i][j] = ‘D’ else if (maximum is V[i-1][j] + g) then T[i][j] = ‘U’ else then T[i][j] = ‘L’ } }

  12. Example V Input: seq2: ACA seq1: GACAT m = 5 mm = -4 gap = -20 seq2 is lined along the rows and seq2 is along the columns G A C A T A C A T

More Related