120 likes | 228 Views
BNFO 136 Sequence alignment. Usman Roshan. Pairwise alignment. X: ACA, Y: GACAT Match=8, mismatch=2, gap-5 ACA-- -ACA- --ACA ACA---- GACAT GACAT GACAT G--ACAT 8+2+2-5-5 -5+8+8+8-5 -5-5+2+2+2 2-5-5-5-5-5-5 Score = 2 14 -4 -28. Traceback.
E N D
BNFO 136Sequence alignment Usman Roshan
Pairwise alignment • X: ACA, Y: GACAT • Match=8, mismatch=2, gap-5 ACA-- -ACA- --ACA ACA---- GACAT GACAT GACAT G--ACAT 8+2+2-5-5 -5+8+8+8-5 -5-5+2+2+2 2-5-5-5-5-5-5 Score = 2 14 -4 -28
Traceback • We can compute an alignment of DNA (or protein or RNA) sequences X and Y with a traceback matrix T. • Sequence X is aligned along the rows and Y along the columns. • Each entry of the matrix T contains D, L, or U specifying diagonal, left or upper
Traceback • X: ACA, Y=TACAG
Traceback • X: ACA, Y=TACAG
Traceback code aligned_seq1 = "" aligned_seq2 = "" i = len(seq2) j = len(seq1) while(i !=0 or j != 0): if(T[i][j] == “L”): aligned_seq1 = “-” + aligned_seq1 aligned_seq1 = seq1[j-1] + aligned_seq1 j = j - 1 elif(T[i][j] == "U"): aligned_seq1 = "-" + aligned_seq1 aligned_seq2 = seq2[i-1] + aligned_seq2 i = i - 1 else: aligned_seq1 = seq1[j-1] + aligned_seq1 aligned_seq2 = seq2[i-1] + aligned_seq2 i = i - 1 j = j - 1
Optimal alignment • An alignment can be specified by the traceback matrix. • How do we determine the traceback for the highest scoring alignment? • Needleman-Wunsch algorithm for global alignment • First proposed in 1970 • Widely used in genomics/bioinformatics • Dynamic programming algorithm
Needleman-Wunsch (NW) • Input: • X = x1x2…xn, Y=y1y2…ym • (X is seq2 and Y is seq1) • Notation: • X1..i = x1x2…xi • Score(X1..i,Y1..j) = Optimal alignment score of sequences X1..i and Y1..j. • Suppose we know the optimal alignment scores of • X1…i-1 and Y1…j-1 • X1…i and Y1...j-1 • X1...i-1 and Y1…j
Needleman-Wunsch (NW) • Then the optimal alignment score of X1…i and Y1…j is the maximum of • Score(X1…i-1,Y1…j-1) + match/mismatch • Score(X1…i,Y1…j-1) + gap • Score(X1…i-1,Y1…j) + gap • We build on this observation to compute Score(Xn,Ym)
Needleman-Wunsch • Define V to be a two dimensional matrix with len(X)+1 rows and len(Y)+1 columns • Let V[i][j] be the score of the optimal alignment of X1…i and Y1…j. • Let m be the match cost, mm be mismatch, and g be the gap cost.
NW pseudocode Initialization: for i = 1 to len(seq2) { V[i][0] = i*g; } For i = 1 to len(seq1) { V[0][i] = i*g; } Recurrence: for i = 1 to len(seq2){ for j = 1 to len(seq1){ V[i-1][j-1] + m(or mm) V[i][j] = max { V[i-1][j] + g V[i][j-1] + g if(maximum is V[i-1][j-1] + m(or mm)) then T[i][j] = ‘D’ else if (maximum is V[i-1][j] + g) then T[i][j] = ‘U’ else then T[i][j] = ‘L’ } }
Example V Input: seq2: ACA seq1: GACAT m = 5 mm = -4 gap = -20 seq2 is lined along the rows and seq2 is along the columns G A C A T A C A T