240 likes | 338 Views
Local Alignments Without Affine Gap penalties. Smith and Waterman. The Smith and Waterman Algorithm. 1…i 1…j-1. - X. +. F(i-1,j) + Gep. 1…i-1 1…j-1. X X. F(i-1,j-1) + Mat[i,j]. +. F(i,j-1) + Gep. 1…i-1 1…j. X -. +. 0. F(i,j)= best. The Smith and Waterman Algorithm. 0
E N D
Local Alignments Without Affine Gap penalties Smith and Waterman
The Smith and Waterman Algorithm 1…i 1…j-1 - X + F(i-1,j) + Gep 1…i-1 1…j-1 X X F(i-1,j-1) + Mat[i,j] + F(i,j-1) + Gep 1…i-1 1…j X - + 0 F(i,j)= best
The Smith and Waterman Algorithm 0 Ignore The rest of the Matrix Terminate a local Aln
Filling up a SW matrix: borders Easy:Local alignments NEVER start/end with a gap… * - A N I C E C A T- 0 0 0 0 0 0 0 0 0 C 0A 0 T 0A 0 N 0 D 0 O 0 G 0
Filling up a SW matrix * - A N I C E C A T- 0 0 0 0 0 0 0 0 0 C 00 0 0 2 0 2 0 0 A 02 0 0 0 0 0 4 0T 00 0 0 0 0 0 2 6A 0 2 0 0 0 0 0 0 4N 0 0 4 2 0 0 0 0 2D 0 0 2 2 0 0 0 0 0O 0 0 0 0 0 0 0 0 0G 0 0 0 0 0 0 0 0 0 Best Local score Beginning of the trace-back Match=2 MisMatch=-1 Gap=-1
for ($i=1; $i<=$len0; $i++) { for ($j=1; $j<=$len1; $j++) { if ($res0[0][$i-1] eq $res1[0][$j-1]){$s=2;} else {$s=-1;} $sub=$mat[$i-1][$j-1]+$s; $del=$mat[$i ][$j-1]+$gep; $ins=$mat[$i-1][$j ]+$gep; if ($sub>$del && $sub>$ins && $sub>0) {$smat[$i][$j]=$sub;$tb[$i][$j]=$subcode;} elsif($del>$ins && $del>0 ) {$smat[$i][$j]=$del;$tb[$i][$j]=$delcode;} elsif( $ins>0 ) {$smat[$i][$j]=$ins;$tb[$i][$j]=$inscode;} else {$smat[$i][$j]=$zero;$tb[$i][$j]=$stopcode;} if ($smat[$i][$j]> $best_score) { $best_score=$smat[$i][$j]; $best_i=$i; $best_j=$j; } } } TurningNW into SW PrepareTrace back
Sequence Alignment Variants Two basic variants of sequence alignment: • Global alignment (Needelman-Wunsch) • Local alignment (Smith-Waterman) • Overlap alignment • Affine cost for gaps We’ll use ideas of dynamic programming presented in the lecture
Overlap Alignment Consider the following problem: • Find the most significant overlap between two sequences S,T ? • Possible overlap relations: a. b. Difference from local alignment: Here we require alignment between the endpoints of the two sequences.
Overlap Alignment Formally: given S[1..n] , T[1..m] find i,j such that: d=max{D(S[1..i],T[j..m]) , D(S[i..n],T[1..j]) , D(S[1..n],T[i..j]) , D(S[i..j],T[1..m]) } is maximal. Solution: Same asGlobal alignment except we don’t not penalise overhanging ends.
Overlap Alignment • Initialization:F[i,0]=0,F[0,j]=0 Recurrence:as in global alignment Score:maximum value at the bottom line and rightmost line
Overlap Alignment (Example) S =PAWHEAE T =HEAGAWGHEE Scoring scheme : Match: +4 Mismatch: -1 Indel: -5
Overlap Alignment (Example) S =PAWHEAE T =HEAGAWGHEE Scoring scheme : Match: +4 Mismatch: -1 Indel: -5
Overlap Alignment (Example) S =PAWHEAE T =HEAGAWGHEE Scoring scheme: Match: +4 Mismatch: -1 Indel: -5
Scoring scheme : Match: +4 Mismatch: -1 Indel: -5 -2 Overlap Alignment (Example) The best overlap is: PAWHEAE------ ---HEAGAWGHEE Pay attention! A different scoring scheme could yield a different result, such as: ---PAW-HEAE HEAGAWGHEE-
Adding Affine Gap Penalties The Gotoh Algorithm
Why Affine gap Penalties are Biologically better GOP Cost Cost=gop+L*gep GOP GOP Or Cost=gop+(L-1)*gep GOP GEP Parsimony: Evolution takes the simplest path (So We Think…) L Afine Gap Penalty
But Harder To compute… Opening Extension ? ? + Opening Extension More Than 3 Ways to extend an Alignment X - Deletion X-XX XXXX X X Alignment - X Insertion
More Questions Need to be asked For instance, what is the cost of an insertion ? 1…I-1 ??X 1…J-1 ??X 1…I ??- 1…J-1 ??X GEP GOP 1…I ??- 1…J ??X
Solution:Maintain 3 Tables Ix: Table that contains the score of every optimal alignment 1…i vs 1…j that finishes with an Insertion in sequence X. Iy: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an Insertion in sequence Y. M: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an alignment between sequence X and Y
M(i-1,j-1) + Mat(i,j) 1…i-1 1…j-1 X X + M(i,j)= best Ix(i-1,j-1) + Mat(i,j) Iy(i-1,j-1) + Mat(i,j) + 1…i-1 X 1…j X X - M(i-1,j) + gop Ix(i,j)= best + 1…i-1 X 1…j - X - Ix(i-1,j) + gep + 1…i X 1…j-1 X - X M(i,j-1) + gop Iy(i,j)= best + 1…i - 1…j-1 X - X Iy(i,j-1) + gep The Algorithm (Global Alignment) Initialization: M(0, 0) = 0, Ix(0, 0) = Iy(0, 0) = −∞ M(i, 0) = Ix(i, 0) = −Gop − (i − 1)Gep, Iy(i, 0) = −∞, for i = 1, . . . , n, and M(0, j) = Iy(0, j) = −Gop − (j − 1)Gep, Ix(0, j) = −∞, for j = 1, . . . ,m.
Linear-Space Alignment Banded Global Alignment, K-band Algorithm, …