Pairwise Alignment

Pairwise Alignment Alexei Drummond

Week 1 Learning Outcomes • Have an appreciation of what Computational Biology is • Know what DNA, RNA and Protein sequences are :-) • Understand that sequence evolution can be modeled with a stochastic model of evolution, so that the probability of evolving from one character to another in a certain time can be calculated • Know what the Jukes Cantor and General time-reversible models molecular evolution imply in terms of rates and base frequencies. CS369 2007

Week 2 Learning Outcomes • Understand the basic principles of dynamic programming • Be familiar with the application of dynamic programming to a variety of simple examples such as • Knapsack problem • RNA secondary structure problem CS369 2007

Dynamic Programming • method for solving combinatorial optimization problems • guaranteed to give optimal solution • generalization of “divide-and-conquer” • relies on “Principle of Optimality” i.e. sub-optimal solution of sub-problem cannot be part of optimal solution of original problem instance. CS369 2007

Principle of Optimality Auckland Te Kuiti Wellington CS369 2007

Key to efficiency • computation is carried out bottom-up • store solutions to sub-problems in a table • all possible sub-problems solved once each, beginning with smallest sub-problems • work up to original problem instance • only optimal solutions to sub-problems are used to compute solution to problem at next level • DO NOT carry out computation in recursive, top-down manner • same sub-problems would be solved many times CS369 2007

Pairwise alignment Sequences x = a c g g t s y = a w g c c t t Alignment x¢ = a – c g g – t s y¢ = a w – g c c t t CS369 2007

Scoring • Numeric score associated with each column • Total score = sum of column scores • Column types: • Identical (+ve) (2) Conservative (+ve) (3) Non-conservative (-ve) (4) Gap (-ve) x¢ = a – c g g– t s y¢ = a w – g cc t t CS369 2007

Scoring • Model-based • Log-odds scoring • Empirical • Often used for amino acid alignments • PAM matrices • BLOSUM matrices • JTT • WAG • Different matrices used depending on the level of similarity of the sequences. • How do you know the similarity before doing the alignment? CS369 2007

Log-odds matrices “What we want to know is whether two sequences are homologous (evolutionarily related) or not, so we want an alignment score that reflects that. Theory says that if you want to compare two hypotheses, a good score is the log-odds score: the logarithm of the ratio of the likelihoods of your two hypotheses. If we assume that each aligned residue pair is statistically independent of the others (biologically dubious, but mathematically convenient), the alignment score is the sum of the individual log-odds score for each aligned residue pair.” Sean R Eddy 2004 CS369 2007

Log-odds matrices “The numerator (pab) is the likelihood of the hypothesis we want to test: that these two residues are correlated because they’re homologous. Thus, pab are the target frequencies: the probability that we expect to observe residues a and b alignment in homologous sequence alignments. The denominator is the likelihood of a null hypothesis: that these two residues are uncorrelated and unrelated, occurring independently” Sean R Eddy, 2004 CS369 2007

Evolutionary interpretation of match/mismatch scores t/2 a, b homologous x y x y (d=0.1 is roughly 90% similarity) d = average number of changes per site a, b not homologous x y x y CS369 2007

Jukes Cantor Model • All mutations are equally likely • xy at the same rate for all x, y • All nucleotides are equally likely (equal base frequencies: • {0.25, 0.25, 0.25, 0.25} for DNA • {0.05,…,0.05} for Proteins DNA Proteins CS369 2007

Evolutionary interpretation of match/mismatch scores (DNA) x y (d=0.1 is roughly 90% similarity) d = average number of changes per site x y CS369 2007

Log-odds match score Probability of ending in the same state after time d Probability of ending in the same state after infinite time CS369 2007

Log-odds mismatch score Probability of ending in y (different from x) after time d Probability of ending in y (different from x), after infinite time CS369 2007

Evolutionary interpretation of match/mismatch scores (DNA) CS369 2007

BLOSUM50 matrix CS369 2007

Gap penalties y¢ • Linear score:g(g) = -gd gap penality • Affine score:g(g) = -d- (g-1)e gap-open penality gap-extension penalty ---------- x¢ g CS369 2007

Needleman & Wunsch algorithm • Dynamic programming algorithm for global alignment • Needleman & Wunsch (‘70), modified Gotoh (‘82) • Assumptions: • Linear gap score d • Symmetric scoring matrix S • s(a,b) = s(b,a) score from lining up a and b • s(a,-) = s(-,a) = -d score from lining up a with - CS369 2007

Principle of Optimality Given sequences: Define: F(i,j) = score of best alignment between and CS369 2007

Principle of Optimality Optimal alignment CS369 2007

Principle of Optimality Optimal alignment Looks like …… CS369 2007

Principle of Optimality Optimal alignment Looks like …… or …………… CS369 2007

Principle of Optimality Optimal alignment Looks like …… or …………… or …………… CS369 2007

Principle of Optimality Optimal alignment Looks like …… or …………… or …………… so …………… CS369 2007

Principle of Optimality Basis: CS369 2007

Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007

Filling up table Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m CS369 2007

Constructing alignment Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m CS369 2007

Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m CS369 2007

Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment X CS369 2007

Pairwise Alignment

Pairwise Alignment

Presentation Transcript

Pairwise Sequence Alignment

Pairwise Sequence Alignment

Pairwise sequence Alignment

Pairwise Sequence Alignment

Pairwise Sequence Alignment

Pairwise sequence Alignment

Pairwise sequence alignment

Pairwise alignment

Pairwise Sequence Alignment

Pairwise Sequence Alignment

Pairwise sequence alignment

Pairwise Sequence Alignment

Pairwise sequence Alignment

Pairwise profile alignment

Pairwise Sequence Alignment

Pairwise alignment

Pairwise alignment

Pairwise Sequence Alignment

Pairwise sequence alignment

Pairwise sequence alignment (practice)

Pairwise alignment

Pairwise sequence alignment