750 likes | 968 Views
Dynamic Programming for Pairwise Alignment 2. Dr Alexei Drummond Department of Computer Science alexei@cs.auckland.ac.nz. Semester 2, 2006. Review. Dynamic programming algorithm for global alignment (Needleman & Wunsch) Given sequences: F(i,j) = score of best alignment between and.
E N D
Dynamic Programmingfor Pairwise Alignment 2 Dr Alexei Drummond Department of Computer Science alexei@cs.auckland.ac.nz Semester 2, 2006
Review Dynamic programming algorithm for global alignment (Needleman & Wunsch) Given sequences: F(i,j) = score of best alignment between and
Principle of Optimality Optimal alignment
Principle of Optimality Optimal alignment Looks like ……
Principle of Optimality Optimal alignment Looks like …… or ……………
Principle of Optimality Optimal alignment Looks like …… or …………… or ……………
Principle of Optimality Optimal alignment Looks like …… or …………… or …………… so ……………
Filling up table Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m
Constructing alignment Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m
Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment Alignment X
Time and space 0 1 2 n 0 1 2 F matrix m table entries space Each entry computed in constant time time
Smith & Waterman algorithm Computes local alignment. i.e. look for best alignment of subsequences of X and Y, ignoring scores of regions on either side Y X Best subsequence alignment
Recurrences Basis:
Example Y Alignment X
Repeated (local) matches Long sequences - interested in alllocal alignments with significant score, > threshold T. e.g. copies of repeated domain or motif in a protein. X = sequence containing motif Y = target sequence Y Matching parts of X Method is asymmetric
Principle of Optimality Given sequences DefineF(i,j)(i ≥ 1) = best sum of match scores in and assuming is in a matched region and match ends in or
Ends of matches best sum of completed match scores to is notin a matched region assuming that Row 0 therefore marks unmatched regions and ends of matches in Y.
General recurrence Start of new match Extension of previous match
Filling up table Y F matrix 0 1 2 n 0 1 2 X m
Filling up table Y F matrix 0 1 2 n 0 1 2 X m
Filling up table Y F matrix 0 1 2 n 0 1 2 X m
Filling up table Y F matrix 0 1 2 n 0 1 2 X m
Filling up table Y F matrix 0 1 2 n 0 1 2 X m
Filling up table Y F matrix 0 1 2 n 0 1 2 X m
Filling up table Y F matrix 0 1 2 n 0 1 2 X m
Filling up table Y F matrix 0 1 2 n 0 1 2 X m
Filling up table Y F matrix 0 1 2 n 0 1 2 X m
Filling up table Y F matrix 0 1 2 n 0 1 2 X m
Filling up table Y F matrix 0 1 2 n 0 1 2 Optimal Sum of alignment scores X m
Example Extra cell for final total score
Example Extra cell for final total score Y Alignment X
Overlap matches Y Y X X Y Y X X Don’t penalize overhanging ends i.e. set F(i,0) = F(0,j) = 0 Otherwise
Example Y Alignment X
Affine gap penalities • Affine score:g(g) = -d- (g-1)e gap-open penality gap-extension penalty • Different penalties associated with extending alignment with gap symbol Y = C C T W P X = C S T W - Y = C C T W P X = C S T - - different from
General recurrence Extend by matching Extend by matching suffix of Y to gap of length i-k Extend by matching suffix of X to gap of length j-k Problem: Procedure runs in worst-case time
version Extra variables
Recurrences aligned to start of gap aligned to continuation of gap aligned to start of gap aligned to continuation of gap Procedure runs in worst-case time
Linear space alignment Y F matrix 0 1 2 n 0 1 2 X m
Linear space alignment Y F matrix 0 1 2 n 0 1 2 X m
Linear space alignment Y F matrix 0 1 2 n 0 1 2 X m
Linear space alignment Y F matrix 0 1 2 n 0 1 2 X m
Linear space alignment Y F matrix 0 1 2 n 0 1 2 X m
Linear space alignment Y F matrix 0 1 2 n 0 1 2 X m
Linear space alignment Y F matrix 0 1 2 n 0 1 2 X m
Linear space alignment Y F matrix 0 1 2 n 0 1 2 X m
Linear space alignment Y F matrix 0 1 2 n 0 1 2 X m
Linear space alignment Y F matrix 0 1 2 n 0 1 2 X m