240 likes | 407 Views
Maximum likelihood (cont.). The maximum likelihood criterion. The optimal tree is that which would be most likely to give rise to the observed data (under a given model of evolution). Stepwise view. Propose a tree with branch lengths Consider the first character
E N D
The maximum likelihood criterion • The optimal tree is that which would be most likely to give rise to the observed data (under a given model of evolution)
Stepwise view • Propose a tree with branch lengths • Consider the first character • Sum the probability of the data arising via each possible history
Worked example A G Non-change = ¼ +¾e-t Change = ¼ - ¼e-t 0.02 0.01 A A 0.03 0.01 0.02 A G L =0.25(¼ +¾e-0.02) (¼ +¾e-0.02) (¼ +¾e-0.03) (¼ - ¼e-0.01) (¼ - ¼e-0.01)
Sum over all other combinations = 7.09 x 10-3 • AA = 5.87232x10-6 • AG = 7.064163x10-3 • AC = 4.43719x10-8 • AT = 4.43719x10-8 • GA = 1.1204x10-12 • GG = 2.36063x10-5 • GC = 1.1204x10-12 • GT = 1.1204x10-12 • CA = 1.1204x10-12 • CG = 1.78372x10-7 • CC = 1.48277x10-12 • CT = 1.1204x10-12 • TA = 1.1204x10-12 • TG = 1.78372x10-7 • TC = 1.1204x10-12 • TT = 1.48277x10-10
Likelihood scores • Raw likelihood of the data at this site given this tree and branch lengths and model = 0.25(7.09 x 10-3) • Log-likelihood = -6.334787983
What does this number mean? • -6.334787983 = The log-likelihood of this character’s tip values given: • This tree topology • These branch lengths • The model of molecular evolution
Stepwise view • Propose a tree with branch lengths • Consider the first character • Sum the probability of the data arising via each possible history • Multiply across all sites
Multiplying across sites To make it easier, we can lump characters with the same “pattern” lnL = [lnL (0000)]N(0000) + [lnL (0001)]N(0001) + [lnL (0010)]N(0010) + [lnL (0100)]N(0100) + [lnL (0111)]N(0111) + [lnL (0011)]N(0011) + [lnL (0101]N(0101)) + [lnL (0110)]N(0110)
Stepwise view • Propose a tree with branch lengths • Consider the first character • Sum the probability of the data arising via each possible history • Multiply across all sites • Find the branch lengths that yield the highest likelihood for the entire data set
What branch lengths should we assume? • Under the principle of maximum likelihood, we use the set of branch lengths that maximize the likelihood • Once we find those branch lengths, the likelihood score is taken as being the likelihood of the data given this tree topology
Stepwise view • Propose a tree with branch lengths • Consider the first character • Sum the probability of the data arising via each possible history • Multiply across all sites • Find the branch lengths that yield the highest likelihood for the entire data set • Search among trees (!)
More complicated (realistic) models for DNA • Allow deviation from equiprobable base frequencies • HKY85; F81; GTR • Allow two substitution types (ti and tv) • K2P; HKY85 • Allow for six substitution types • GTR
9 6 6 3 5 4 2 1 Relationship among models
Site-to-site rate heterogeneity • The easiest is to assume that all characters have the same intrinsic rate of evolution – but this is unrealistic • What can we do instead?
Long Branch Attraction 0.1 B True tree 0.1 0.4 0.2 A 0.1 C 0.2 D Data patterns Frequency 0011 0.0682 0101 0.0522 0110 0.1242 Parsimony will be positively misleading
Long Branch Attraction 0.1 B True tree 0.1 0.4 0.2 A 0.1 C 0.2 D Parsimony estimate
Why does parsimony fail? • Parsimony is “blind” to branch lengths • It cannot see cases in which the data are better explained by rapid evolution and unequal branch lengths • ML can (if you have the right model of evolution)
Relationship between MP and ML • One argument - MP is inherently nonparametric No direct comparison possible • MP is an ML model that makes particular assumptions
Parsimony-like likelihood model(see Lewis 1998 for more) • Estimate branch-length independently for each character (or force lengths to be equal) • Only sum over maximum likelihood ancestral states
Why use MP • The model is less realistic, but: • We can do more thorough searches and data exploration (computational efficiency) • Robust results will usually still be supported
Why use ML • The model is explicit • We can statistically compare alternative models of molecular evolution • We can conduct parametric statistical tests • (Even the most complex model is still unrealistically simple)
Relationship between MP and ML • One argument - MP is inherently nonparametric No direct comparison possible • MP is an ML model that makes particular assumptions