100 likes | 139 Views
Molecular phylogenetics 4. Level 3 Molecular Evolution and Bioinformatics Jim Provan. Page and Holmes: Sections 6.7-8. Have we got the true tree?. Several approaches developed to answer this question: Analysis:
E N D
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 6.7-8
Have we got the true tree? • Several approaches developed to answer this question: • Analysis: • In some cases (e.g. UPGMA) the phylogenetic method is simple enough that we can establish mathematically the exact conditions under which it will fail • Parsimony can fail under particular distribution of edge lengths • Known phylogenies • Best evidence for success of a tree-building method would be if it could accurately reconstruct a known phylogeny • Typically, only “known” phylogenies exist for crop plants and laboratory animals and even these are often suspect • Growth of bacteriophage T7 in the presence of mutagens allowed comparison of tree building methods
Have we got the true tree? • Several approaches (continued): • Simulation: • Provide software with a tree and “evolve” DNA sequences along branches according to some model • Supply the resulting sequences for a range of tree-building methods and determine which (if any) recover the original tree • An advantage of this approach is that we can explore the effects of a wide range of parameters on the performance of tree reconstruction methods • A disadvantage is that the models used to generate the new sequences may be unrealistic, particularly in biasing the model towards a particular method
UPGMA Parsimony The “Felsenstein Zone”
Congruence • Congruence is the agreement between estimates of phylogeny based on different characters: • If data sets are independent, the probability of obtaining similar trees is extremely small • Conversely, if different data sets give similar trees then this suggests that both reflect the same underlying cause, namely they reflect the same evolutionary history • Two ways of using congruence: • To validate a method of inference: a method that constantly recovers similar trees from different data sets will be preferred to a method that produces different trees from different data sets • To validate a new source of data: does a newly sequenced gene contain phylogenetic information?
Sampling error • If a data set contains homoplasy then different nucleotide sites support different trees: • Which tree(s) a given data set supports depends on which characters have been sampled • Estimates of phylogeny based on samples will be accompanied by sample error • Effects of sampling error evident by comparing trees for different mitochondrial genes: • Since there is no recombination, all mitochondrial genes share the same evolutionary history • Several different trees were obtained • Sampling of taxa is also important
Bootstrapping • Bootstrapping is a way of calculating sampling error without taking repeated samples from the population / species under study: • Mimics the technique of repeated sampling from the original population by resampling from the original sample • Each resampling is a pseudoreplicate • Bootstrapping can be applied to phylogenetics by taking several pseudoreplicates: • Sampling with replacement gives a new data set based on the original sample: • Some sites represented more than once • Some sites not represented at all • Pseudoreplicate can be used to construct a new tree
Original tree Bootstrap tree Bootstrapping 1 2 3 4 5 6 7 8 9 HumanT C C T T A A A A ChimpT T C T A T A A A GorillaT T A C A A T A A Orang-utanC C A C A A A T A GibbonC C A C A A A A T 2 7 7 3 1 7 4 9 6 C A A C T A T A A C A A C T A T A T A T T A T T C A A A A A A C A C A A A A A A C A C T A
C H H G G C B B B H C G O O O 41/100 28/100 31/100 Bootstrapping
What can go wrong? • Sampling error: • Almost all phylogenies are based on a sample of some sort • Especially true given the vagaries of homoplasy • Incorrect model of sequence evolution: • All methods make implicit or explicit assumptions about evolutionary process • Example is problem of base composition: • An AT rich part of a gene may be more similar to an AT rich part of a different gene purely by chance • Tree structure: • Evolutionary history is not always simple: • Rapid cladogenesis • Widely differing rates of divergence • Horizontal gene transfer