280 likes | 421 Views
Doug Raiford Lesson 9. Phylogenetics Part II. Review. Three Major Categories. 3 Approaches Distance Parsimony Maximum Likelihood Have already seen a distance method. UPGMA. Why need any other method?. What’s wrong with UPGMA? Let’s revisit the example
E N D
Doug Raiford Lesson 9 Phylogenetics Part II
Review • Three Major Categories • 3 Approaches • Distance • Parsimony • Maximum Likelihood • Have already seen a distance method • UPGMA Phylogenetics Part II
Why need any other method? • What’s wrong with UPGMA? • Let’s revisit the example • Can this be? Doesn’t the derived tree imply that B is equidistant from C and D A B C D Phylogenetics Part II
Molecular clock hypothesis • UPGMA averaged the two and put them both (branches for C and D) at 1.5 • What if don’t have equal rates of evolution after a divergence .5 .5 4 2.5 1 2 A B C D Phylogenetics Part II
Very similar taxa • Differing rates of evolution can sometimes cause problems with UPGMA • Especially if very similar (small distances) This tree Yields this matrix Yields this tree 1 1 2 1 A B B C A C Phylogenetics Part II
Next: maximum parsimony • Also called minimum evolution method • Definition of parsimony: 1 a : the quality of being careful with money or resources : thrift b : the quality or state of being stingy 2 : economy in the use of means to an end; especially : economy of explanation in conformity with Occam's razor • Ockham's razor: the simplest explanation is usually the best Phylogenetics Part II
Approach • Looks at each column of an MSA and attempts to find a tree that describes • Builds a consensus tree atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcag acctccatacgtgccccaggagatctggactttcacc---tggatcatgcgaccgtacctac t-atgg-t-cgtgccgcaggagatcaggactttca-gt--g-aatcatctgg-cgc--c-aa t--tcgt-ac-tgccccaggagatctggactttcaaa---ca-atcatgcgcc-g-tc-tat aattccgtacgtgccgcaggagatcaggactttcag-t--a-tatcatctgtc-ggc--tag Phylogenetics Part II
Which tree? • What do we mean when we say “attempts to find a tree that describes” • Attempts to fit all possible trees in each column and choose best • How determine all possible trees? • How determine which one has the best fit? • Assume that majority nucleotide represents ancestor Total mutations that explain this tree = 1 Pretty darn good AGCT AACT AACT AACT A or a G 0 if A 0 if A A A or a G One possible tree 0 0 if A 1 if A 0 A A A G Phylogenetics Part II
How determine all possible trees • When there are two organisms there is only one possible tree A B Phylogenetics Part II
How determine all possible trees • What about when there are three • Third could go… A B Phylogenetics Part II
What about 4? • For each of the previous 3 trees, could add 4th to any of its branches (or could form a new root) • Each of the possible trees had 4 branches so could add to one of 4 locations (or splice in at top) • So total number of trees with 4 leaves: • 3*5=15 If this were the tree A B Phylogenetics Part II
Number of trees • Ni is number of trees given itaxa • Bi is the number of branches in a tree given itaxa • Bi=Bi-1+2, also i x 2-2 • Ni=Ni-1*(Bi-1+1) • plus 1 due to possible new root • N2= 1 • B2=2 Defined by a recurrence relation so … That’s right, as usual, exponential What does this growth rate look like? Phylogenetics Part II
Can save some by going unrooted • Rooted vs. un-rooted • Wherever the root is, un-kink it Phylogenetics Part II
Rules for un-rooted trees • Always bifurcated • Can never have 3 branches “from” a single node • What are the odds? A D B C Phylogenetics Part II
With four nodes • Three possible trees A A A D B D B D C C C B Are there any other combinations? Phylogenetics Part II
With 5 nodes • For each of the three trees (having 4 taxa) could add a branch to any of the 5 branches • 3*5=15 trees A D B C Phylogenetics Part II
Rooting a tree • Outgroup • Include an organism that is known to be further away from all taxa than they are from each other A D B C If outgroup goes here… A B C D outgroup Phylogenetics Part II
Number of un-rooted trees • Ni is number of trees given itaxa • Bi is the number of branches in a tree given itaxa • Bi=Bi-1+2, also i x 2-3 • Ni=Ni-1*(Bi-1) • No need for a “plus 1” for a possible new root because there are no roots • N2= 1 • B2=2 Phylogenetics Part II
Very bright mathematicians • Noticed that for un-rooted trees: • Bi=2i-3 (for i 2) • Also noticed • Ni=Ni-1*Bi-1 • And reduced to • (2n-5)(2n-7)(2n-9)…(3)(1) where n is number of taxa • Shorthand: (2n-5)!! • For rooted • Ni=Ni-1*(Bi-1+1) • Reduced to • (2n-3)!! Ni=Bi-1*Ni-1 =(2(i-1)-3)Ni-1 =(2i-5)Ni-1 =(2i-5)(2i-7)Ni-2 Till the N term gets to 3 Double factorial: each successive number reduced by two Phylogenetics Part II
Compare • Radical reduction in the number • Still only bought one additional taxa Phylogenetics Part II
Rooted and unrooted • Even brighter mathematicians Can you see why? Phylogenetics Part II
How can we reduce the complexity? • Not really a candidate for dynamic programming • Don’t repeat a bunch of sub-problems over and over • Each sub-problem is a tree, and they are all unique Still exponential Phylogenetics Part II
Branch and bound (pruning) • Discard large subsets of possible solutions • Use heuristics or predictions Don’t bother Phylogenetics Part II
Upper bound on tree length • Calculate a reasonable upper bound using a fast algorithm like UPGMA (hierarchical clustering) • Incrementally grow potential trees • Any branch that any that go over threshold stop investigating A D B C Don’t bother, over threshold X X X Phylogenetics Part II
Back to max parsimony algorithm • Some columns all same • Add no meaning • All trees minimum • Columns that are all different • Also add no meaning • Must have minimum 2 nt’s (or aa’s) that are the same • Useful in one respect • If all the same infer makeup of ancestor AGCT AACT AACT ACCT A 0 A 0 A 0 0 0 0 A A A A Phylogenetics Part II
Now consensus tree • Each column yields a tree • If all agree done • If some different use majority rule • If sample too small perform bootstrapping • randomly draw sequences from MSA • Generate more trees • labeled branches with the percentage of bootstrap trees in which they appear • Used as a measure of support (repeatability) Phylogenetics Part II
What’s left? • Still have maximum likelihood • Also, some inferential stuff, but that’s all in the next lecture Stay tuned for part III Phylogenetics Part II