CS 451 / 558

CS 451 / 558 Week 4, Tue

Scoring an alignment Let: xk:= the kthletter of x yk:= the kthletter of y Input: string x, length m string y, length n (both from alphabet S [ACGT]) scoring matrix s, s.t.s(a,b) := the score of aligning a to b gap penalty g S = 0 # the score for (i=0; i<m; i++) if (Si or Ti is a gap character) S -= g else S += s(xi ,yi)

Finding an optimal alignment Dynamic Programming • Looks like a merger of the 2D dotplot array and the alignment scoring

Finding an optimal alignment Dynamic Programming • Looks like a merger of the 2D dotplot array and the alignment scoring • But it’s actually more than that

Finding an optimal alignment Dynamic Programming • a recursive definition of the optimal score

a recursive definition of the optimal score the optimal solution depends on • optimal solutions to subproblems of the same form • local calculations based on those solutions • e.g. score for alignment of x and y • Can only end one of three ways: • xm aligned to yn • xm aligned to nothing (ynalready used) • yn aligned to nothing (xmalready used) S = S(m-1,n-1) + s(xm,yn) S = S(m-1,n) + g S = S(m,n-1) + g

a recursive definition of the optimal score the optimal solution depends on • optimal solutions to subproblems of the same form • local calculations based on those solutions • e.g. score for alignment of x and y • Can only end one of three ways: • xm aligned to yn • xm aligned to nothing (ynalready used) • yn aligned to nothing (xmalready used) • generally S = S(m-1,n-1) + s(xm,yn) S = S(m-1,n) + g S = S(m,n-1) + g

Finding an optimal alignment Dynamic Programming • a recursive definition of the optimal score

Finding an optimal alignment Dynamic Programming • a recursive definition of the optimal score • a dynamic programming matrix for remembering optimal scores of subproblems

a dynamic programming matrix for remembering optimal scores of subproblems

Finding an optimal alignment Dynamic Programming • a recursive definition of the optimal score • a dynamic programming matrix for remembering optimal scores of subproblems • a bottom-up approach of filling the matrix by solving the smallest subproblems first

a bottom-up approach of filling the matrix by solving the smallest subproblems first move me

Scoring an optimal alignment Input: strings x & y, lengths m & n scoring matrix s, s.t.s(a,b) := the score of aligning a to b gap penalty g S0,0 = 0 # the score for (i=0; i<m; i++) initialize Si,0 for (j=0; j<n; j++) initialize S0,j for (i=0; i<m; i++) for (j=0; j<n; j++) Si-1,j-1 +s(xi ,yi), Si,j = max Si,j-1 – g , Si-1,j– g

Finding an optimal alignment Dynamic Programming • a recursive definition of the optimal score • a dynamic programming matrix for remembering optimal scores of subproblems • a bottom-up approach of filling the matrix by solving the smallest subproblems first • a traceback of the matrix to recover the structure of the optimal solution that gave the optimal score

Global alignment vs local alignment

0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 5 5 0 3 0 0 0 4 2 0 0 3 0 8 10 4 0 8 6 0 2 0 0 2 4 0 2 0 5 12 0 6 0 0 0 2 0 0 6 0 18 0

Global alignment vs local alignment

local alignment Input: strings x & y, lengths m & n scoring matrix s, s.t.s(a,b) := the score of aligning a to b gap penalty g S0,0 = 0 # the score for (i=0; i<m; i++) initialize Si,0 for (j=0; j<n; j++) initialize S0,j for (i=0; i<m; i++) for (j=0; j<n; j++) Si-1,j-1 +s(xi ,yi), Si,j = max Si,j-1 – g , Si-1,j– g 0

Score matrix P(xi , yj| model of homology)

Score matrix P(xi , yj | model of homology) P(xi , yj | model of nonhomology)

Score matrix f (xi , yj) * f (xi) f (yj) ** * From alignments of trusted homologs ** Observed frequencies in large database of repr. seqs

Score matrix f (xi , yj) * s(xi , yj)= log f (xi) f (yj) ** * From alignments of trusted homologs ** Observed frequencies in large database of repr. seqs

Score matrix BLOSUM PAM VTML … f (xi , yj) * s(xi , yj)= log f (xi) f (yj) **

Score matrix f (xi , yj) * s(xi , yj)= log f (xi) f (yj) **

Gap penalties Usually ad hoc What works well with the chosen score matrix Linear / affine gap penalties Affine = geometric length distribution

CS 451 / 558

CS 451 / 558

Presentation Transcript