260 likes | 415 Views
Likelihood methods. Trees - “What is the probability that a proposed model of sequence evolution and a particular tree would give rise to the observed data?” “What tree and model would maximize the probability of observing the observed data?. P (data) :: tree, model.
E N D
Likelihood methods Trees - “What is the probability that a proposed model of sequence evolution and a particular tree would give rise to the observed data?” “What tree and model would maximize the probability of observing the observed data?
P (data) :: tree, model In practice, the data are “given,” the tree is a hypothesis, and the model of the evol’n process is usually unknown, but w/ parameters either “given” based on external knowledge or estimated from the data set. Therefore, we search for the hypothesis (tree) that gives the best probability of getting the observed data.
Potential Benefits of Likelihood • Improved compensation for superimposed changes using explicit models • Method is consistent • Usually minimizes variance of model parameters • Often robust to violations of assumptions • Estimation and testing of evolutionary models and hypotheses is a natural outcome
Likelihood of a tree II Fixed Tree- dependent 4 bases x 4 bases = 16 possibles. Some much more probable.
Likelihood of a tree III If we can assume that nucleotide sites evolve independently, the Likelihood of full tree is product of likelihood at each site -- because these are vanishingly small., usu. Would log transform, so log likelihood of the tree is sum of log likelihoods of each site
eg, if L(tree1) = .0000002, ln L = -15.4 if L(tree2) = .0000004, ln L = -14.7 If L(tree3) = .0000008, ln L = -14.0
Likelihood of a tree IV 5. X P ( A to G) 4. X P ( retaining A) 3. X P ( A to C) 2. X P ( A to C) 1. X P ( retaining A) 0. Prior probability of an “A” Probabilities are a function of: Substitution model, base frequencies, branch lengths
Calculation of probability of substitution or retention Probabilities are a function of: Substitution model, base frequencies, branch lengths * See example in Mount, p. 277 * Formal analysis takes uses the model (JC, HKY, etc.) to generate explicit probabilities
-4 t -4 t eg., Probability of a substitution: b e a d f c C Under Jukes-Cantor PC = (1 + 3 e )/4 PnotC = 3/4 * (1 - e )
Likelihood of state i at position j in A Likelihood that i could give rise to state in B * Prob of state i changing to state k Likelihood that B has state k Branch length Similar for going to outcome in C Ie., Conditional likelihood that A has state i is the product of the likelihoods that the i could have given rise to the outcomes in B and C
Likelihood Ratio test • = max[L(null hypothesis data)] max[L(alternative hypothesis data)] • Huelsenbeck et al (1997) Science. 276:227
Potential Benefits of Likelihood • Improved compensation for superimposed changes using explicit models • Method is consistent • Usually minimizes variance of model parameters • Often robust to violations of assumptions • Estimation and testing of evolutionary models and hypotheses is a natural outcome **** effective Likelihood analysis requires a lg. Dataset, and full ML analysis is comput. intensive
Likelihood Ratio test • = max[L(null hypothesis data)] max[L(alternative hypothesis data)] • Huelsenbeck et al (1997) Science. 276:227
Potential Benefits of Likelihood • Improved compensation for superimposed changes using explicit models • Method is consistent • Usually minimizes variance of model parameters • Often robust to violations of assumptions • Estimation and testing of evolutionary models and hypotheses is a natural outcome **** effective Likelihood analysis requires a lg. Dataset, and full ML analysis is comput. intensive