1 / 24

Maximum likelihood (cont.)

Maximum likelihood (cont.). The maximum likelihood criterion. The optimal tree is that which would be most likely to give rise to the observed data (under a given model of evolution). Stepwise view. Propose a tree with branch lengths Consider the first character

orrin
Download Presentation

Maximum likelihood (cont.)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maximum likelihood(cont.)

  2. The maximum likelihood criterion • The optimal tree is that which would be most likely to give rise to the observed data (under a given model of evolution)

  3. Stepwise view • Propose a tree with branch lengths • Consider the first character • Sum the probability of the data arising via each possible history

  4. Worked example A G Non-change = ¼ +¾e-t Change = ¼ - ¼e-t 0.02 0.01 A A 0.03 0.01 0.02 A G L =0.25(¼ +¾e-0.02) (¼ +¾e-0.02) (¼ +¾e-0.03) (¼ - ¼e-0.01) (¼ - ¼e-0.01)

  5. Sum over all other combinations = 7.09 x 10-3 • AA = 5.87232x10-6 • AG = 7.064163x10-3 • AC = 4.43719x10-8 • AT = 4.43719x10-8 • GA = 1.1204x10-12 • GG = 2.36063x10-5 • GC = 1.1204x10-12 • GT = 1.1204x10-12 • CA = 1.1204x10-12 • CG = 1.78372x10-7 • CC = 1.48277x10-12 • CT = 1.1204x10-12 • TA = 1.1204x10-12 • TG = 1.78372x10-7 • TC = 1.1204x10-12 • TT = 1.48277x10-10

  6. Likelihood scores • Raw likelihood of the data at this site given this tree and branch lengths and model = 0.25(7.09 x 10-3) • Log-likelihood = -6.334787983

  7. What does this number mean? • -6.334787983 = The log-likelihood of this character’s tip values given: • This tree topology • These branch lengths • The model of molecular evolution

  8. Stepwise view • Propose a tree with branch lengths • Consider the first character • Sum the probability of the data arising via each possible history • Multiply across all sites

  9. Multiplying across sites To make it easier, we can lump characters with the same “pattern” lnL = [lnL (0000)]N(0000) + [lnL (0001)]N(0001) + [lnL (0010)]N(0010) + [lnL (0100)]N(0100) + [lnL (0111)]N(0111) + [lnL (0011)]N(0011) + [lnL (0101]N(0101)) + [lnL (0110)]N(0110)

  10. Stepwise view • Propose a tree with branch lengths • Consider the first character • Sum the probability of the data arising via each possible history • Multiply across all sites • Find the branch lengths that yield the highest likelihood for the entire data set

  11. What branch lengths should we assume? • Under the principle of maximum likelihood, we use the set of branch lengths that maximize the likelihood • Once we find those branch lengths, the likelihood score is taken as being the likelihood of the data given this tree topology

  12. Stepwise view • Propose a tree with branch lengths • Consider the first character • Sum the probability of the data arising via each possible history • Multiply across all sites • Find the branch lengths that yield the highest likelihood for the entire data set • Search among trees (!)

  13. More complicated (realistic) models for DNA • Allow deviation from equiprobable base frequencies • HKY85; F81; GTR • Allow two substitution types (ti and tv) • K2P; HKY85 • Allow for six substitution types • GTR

  14. 9 6 6 3 5 4 2 1 Relationship among models

  15. Site-to-site rate heterogeneity • The easiest is to assume that all characters have the same intrinsic rate of evolution – but this is unrealistic • What can we do instead?

  16. Long Branch Attraction 0.1 B True tree 0.1 0.4 0.2 A 0.1 C 0.2 D Data patterns Frequency 0011 0.0682 0101 0.0522 0110 0.1242 Parsimony will be positively misleading

  17. Long Branch Attraction 0.1 B True tree 0.1 0.4 0.2 A 0.1 C 0.2 D Parsimony estimate

  18. Why does parsimony fail? • Parsimony is “blind” to branch lengths • It cannot see cases in which the data are better explained by rapid evolution and unequal branch lengths • ML can (if you have the right model of evolution)

  19. Relationship between MP and ML • One argument - MP is inherently nonparametric  No direct comparison possible • MP is an ML model that makes particular assumptions

  20. Parsimony-like likelihood model(see Lewis 1998 for more) • Estimate branch-length independently for each character (or force lengths to be equal) • Only sum over maximum likelihood ancestral states

  21. Why use MP • The model is less realistic, but: • We can do more thorough searches and data exploration (computational efficiency) • Robust results will usually still be supported

  22. Why use ML • The model is explicit • We can statistically compare alternative models of molecular evolution • We can conduct parametric statistical tests • (Even the most complex model is still unrealistically simple)

  23. Relationship between MP and ML • One argument - MP is inherently nonparametric  No direct comparison possible • MP is an ML model that makes particular assumptions

More Related