720 likes | 875 Views
Pairwise Alignment. Alexei Drummond. Week 1 Learning Outcomes. Have an appreciation of what Computational Biology is Know what DNA, RNA and Protein sequences are :-)
E N D
Pairwise Alignment Alexei Drummond
Week 1 Learning Outcomes • Have an appreciation of what Computational Biology is • Know what DNA, RNA and Protein sequences are :-) • Understand that sequence evolution can be modeled with a stochastic model of evolution, so that the probability of evolving from one character to another in a certain time can be calculated • Know what the Jukes Cantor and General time-reversible models molecular evolution imply in terms of rates and base frequencies. CS369 2007
Week 2 Learning Outcomes • Understand the basic principles of dynamic programming • Be familiar with the application of dynamic programming to a variety of simple examples such as • Knapsack problem • RNA secondary structure problem CS369 2007
Dynamic Programming • method for solving combinatorial optimization problems • guaranteed to give optimal solution • generalization of “divide-and-conquer” • relies on “Principle of Optimality” i.e. sub-optimal solution of sub-problem cannot be part of optimal solution of original problem instance. CS369 2007
Principle of Optimality Auckland Te Kuiti Wellington CS369 2007
Principle of Optimality Auckland Te Kuiti Wellington CS369 2007
Key to efficiency • computation is carried out bottom-up • store solutions to sub-problems in a table • all possible sub-problems solved once each, beginning with smallest sub-problems • work up to original problem instance • only optimal solutions to sub-problems are used to compute solution to problem at next level • DO NOT carry out computation in recursive, top-down manner • same sub-problems would be solved many times CS369 2007
Pairwise alignment Sequences x = a c g g t s y = a w g c c t t Alignment x¢ = a – c g g – t s y¢ = a w – g c c t t CS369 2007
Scoring • Numeric score associated with each column • Total score = sum of column scores • Column types: • Identical (+ve) (2) Conservative (+ve) (3) Non-conservative (-ve) (4) Gap (-ve) x¢ = a – c g g– t s y¢ = a w – g cc t t CS369 2007
Scoring • Model-based • Log-odds scoring • Empirical • Often used for amino acid alignments • PAM matrices • BLOSUM matrices • JTT • WAG • Different matrices used depending on the level of similarity of the sequences. • How do you know the similarity before doing the alignment? CS369 2007
Log-odds matrices “What we want to know is whether two sequences are homologous (evolutionarily related) or not, so we want an alignment score that reflects that. Theory says that if you want to compare two hypotheses, a good score is the log-odds score: the logarithm of the ratio of the likelihoods of your two hypotheses. If we assume that each aligned residue pair is statistically independent of the others (biologically dubious, but mathematically convenient), the alignment score is the sum of the individual log-odds score for each aligned residue pair.” Sean R Eddy 2004 CS369 2007
Log-odds matrices “The numerator (pab) is the likelihood of the hypothesis we want to test: that these two residues are correlated because they’re homologous. Thus, pab are the target frequencies: the probability that we expect to observe residues a and b alignment in homologous sequence alignments. The denominator is the likelihood of a null hypothesis: that these two residues are uncorrelated and unrelated, occurring independently” Sean R Eddy, 2004 CS369 2007
Evolutionary interpretation of match/mismatch scores t/2 a, b homologous x y x y (d=0.1 is roughly 90% similarity) d = average number of changes per site a, b not homologous x y x y CS369 2007
Jukes Cantor Model • All mutations are equally likely • xy at the same rate for all x, y • All nucleotides are equally likely (equal base frequencies: • {0.25, 0.25, 0.25, 0.25} for DNA • {0.05,…,0.05} for Proteins DNA Proteins CS369 2007
Evolutionary interpretation of match/mismatch scores (DNA) x y (d=0.1 is roughly 90% similarity) d = average number of changes per site x y CS369 2007
Log-odds match score Probability of ending in the same state after time d Probability of ending in the same state after infinite time CS369 2007
Log-odds mismatch score Probability of ending in y (different from x) after time d Probability of ending in y (different from x), after infinite time CS369 2007
Evolutionary interpretation of match/mismatch scores (DNA) CS369 2007
Evolutionary interpretation of match/mismatch scores (DNA) CS369 2007
BLOSUM50 matrix CS369 2007
Gap penalties y¢ • Linear score:g(g) = -gd gap penality • Affine score:g(g) = -d- (g-1)e gap-open penality gap-extension penalty ---------- x¢ g CS369 2007
Needleman & Wunsch algorithm • Dynamic programming algorithm for global alignment • Needleman & Wunsch (‘70), modified Gotoh (‘82) • Assumptions: • Linear gap score d • Symmetric scoring matrix S • s(a,b) = s(b,a) score from lining up a and b • s(a,-) = s(-,a) = -d score from lining up a with - CS369 2007
Principle of Optimality Given sequences: Define: F(i,j) = score of best alignment between and CS369 2007
Principle of Optimality Optimal alignment CS369 2007
Principle of Optimality Optimal alignment Looks like …… CS369 2007
Principle of Optimality Optimal alignment Looks like …… or …………… CS369 2007
Principle of Optimality Optimal alignment Looks like …… or …………… or …………… CS369 2007
Principle of Optimality Optimal alignment Looks like …… or …………… or …………… so …………… CS369 2007
Principle of Optimality Basis: CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 X m CS369 2007
Filling up table Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m CS369 2007
Constructing alignment Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m CS369 2007
Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m CS369 2007
Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment X CS369 2007
Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment X CS369 2007
Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment X CS369 2007
Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment X CS369 2007
Example Y F matrix 0 1 2 n 0 1 2 Optimal alignment score X m Y Alignment X CS369 2007