550 likes | 758 Views
Chapter 5 Character–Based Methods of Phylogenetics. 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/04/05. 5.1 Parsimony. Mutations are exceedingly rate events. The most unlikely events a model invokes, the less likely the model is to be correct.
E N D
Chapter 5Character–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/04/05
5.1 Parsimony • Mutations are exceedingly rate events. • The most unlikely events a model invokes, the less likely the model is to be correct. • The fewest number of mutations to explain a state is the most likely to be correct.
Ockham's Razor • the philosophic rule that entities should not be multiplied unnecessarily
5.1.1 Informative and Uninformative Sites • informative sites • have information to construct a tree • uninformative sites • have no information in the sense of parsimony principle.
A position to be informative must have • at least two different nucleotides • each of these nucleotides to present at least twice.
informative sites • synapomorphy: support the internal branches (true) • homoplasy: acquired as a result of parallel evolution of convergence (false) • 眼睛:humans, flies, mollusks (軟體動物)
5.1.2 Unweighted Parsimony • Every possible tree is considered individually for each informative site. • The tree with the minimum overall costs are reported.
There are several problems: • The number of alternative unrooted trees increases dramatically. • Calculating the number of substitutions invoked by each alternative tree is difficult.
The second problem can be solved by • intersection: if the intersection of the two sets of its children is not empty • union: if it is empty. • The number of unions is the minimum number of substitutions. • For uninformative site, it is the number of different nucleotides minus one.
5.1.4 Weighted Parsimony • Not all mutations are equivalent • Some sequences (e.g., non-coding seq.) are more prone to indel than others. • Functional importance differs from gene to gene. • Subtle substitution biases usually vary between genes and between species. Weights (scoring matrices) can be added to reflect these differences.
5.2 Inferred Ancestral Sequences • Can be derived while constructing the tree. • No missing link! • 如何取樣本? It may be bias.
5.3 Strategies for Faster Searches • The number of different phylogenetic tree grows enormously. • 10 sequences 2M for exhaustive search
5.3.1 Branch and Bound • Provided by Hardy & Penny in 1982. • L: an upper bound (for minimum problem) • obtained from random search or by heuristics (e.g., UPGMA) • Incrementally growing a tree. (branch) • Prune any branch with cost already greater than L. (bound)
Properties • complete search • efficient w.r.t. exhaustive search • 20 sequences are doable.
5.3.2 Heuristic Searches • local search • Alternative trees are not all independent of each other. • branch swapping (Fig. 5.5) • Properties • not complete, may lose the optimal solution • fast and efficient • local minimal
5.4 Consensus Trees • Problem • Parsimony approaches may yield more than one trees. • consensus tree • an agreement or a summary of these trees • agree bifurcation • not agree multi-furcation
5.5 Tree Confidence • How much confidence can be attached to the overall tree and its component parts • How much more likely is one tree to be correct than a particular or randomly chosen alternative tree?
5.5.1 Bootstrap Tests • Randomly choose columns to combine into a new alignment of the same order. • Reconstruct the tree for the new sample. • Repeat (1) (2) for many times. • Consensus the sampled trees w.r.t. the tested one.
Caution • Test based on fewer than several hundred iterations are not reliable. • Underestimate the confidence level at high values and overestimate it at low values. • Some results may appear to be statistically significant by chance simply so many groupings are being considered.
Strategy • doing thousands of iterations • using a correction method to adjust for estimation biases • collapsing branches to multi-furcations • What happens if a tree-building algorithm always produces the same tree?
5.5.2 Parametric Tests (???) • What is the limit of Parsimony Principle? • especially for distant sequences • the most parsimonious tree v.s. a particular alternative (this can be used to estimate the significance of the built tree)
H. Kishino & M. Hasegawa (1989) • Assume that informative sites within an alignment are both independent and equivalent. • D: difference of minimum number of substitutions invoked by two trees
5.6 Comparison of Phylogenetic Methods • 用兩種不同的方法, 如果建構出相同的樹, 那麼其正確性就很高.
5.7 Molecular Phylogenies • Implications • medicine: drug treatment • agriculture: disease resistance factors • conservation (保育): 絕種物種之認定
5.7.1 The Tree of Life • Carl Woese and his colleagues (1970s) • 16S rRNA (all organisms possess)
5.7.2 Human Origins • mtDNA • The mean difference between two human populations is about 0.33%. • The greatest differences are found in Alfrica, not across the different continents! • out-of-Africa theory • mtRNA & Y chromosome are consistent with this hypothesis
They concluded • mitochondrial Eve & Y chromosome Adam • 200’000 years ago