1 / 35

Uncorrelated and Autocorrelated relaxed phylogenetics

Uncorrelated and Autocorrelated relaxed phylogenetics. Michaël Defoin-Platel and Alexei Drummond. (Bayesian) RELAXED PHYLOGENETICS. t 0. t 1. b 1. b 3. t 2. b 5. time. b 2. b 4. Relaxed Phylogenetics allows

foy
Download Presentation

Uncorrelated and Autocorrelated relaxed phylogenetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Uncorrelated and Autocorrelatedrelaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond

  2. (Bayesian) RELAXED PHYLOGENETICS t0 t1 b1 b3 t2 b5 time b2 b4 • Relaxed Phylogenetics allows • the co-estimation of divergence times together with a phylogenetic reconstruction • should be compared with • Unrooted • (2n-3 parameters) • Rooted with a strict clock • (n-1 divergence times) Relaxed Phylogenetics

  3. TIME, SUBSTITUTIONS, and RATES 0 • Time, substitutions and rates • Expected number of substitutions per site on a particular branch i • Substitution rate R(t) cannot be directly observed ! • Only the product of rate and time is identifiable • Without information external to the data, rate and time cannot be separated… time T i Relaxed Phylogenetics

  4. MOLECULAR CLOCK HYPOTHESIS • Molecular Clock Hypothesis (MCH) • (Zuckerlandl and Pauling 1965) • DNA and protein sequences change at a rate that is constant over time • First the substitution rate is estimated then time corresponds to sequence divergence divided by the rate • Estimation of relative rate and relative divergence times • Calibration • Time reference, scaling • Bayesian Phylogenetics : Priors on node height or on tips • Transform relative to absolute rate Relaxed Phylogenetics

  5. MOLECULAR CLOCK HYPOTHESIS • Substitution rate depends on • Natural selection, population size, body mass, generation time, mutation rate, mutation pattern, … • MCH is often violated ! • How to deal with non-clock like data • Keep them ! • Remove them ! • Relax the MCH • Allow the rate of evolution to vary • Make assumptions about the variations Relaxed Phylogenetics

  6. RELAXING THE MCH • Modeling the “Rate of evolution of the rate of evolution” • Sanderson “nonparametric” model • (Random) Local Clock model • Uncorrelated relaxed clock model • Autocorrelated relaxed clock model • Compound Poisson process • Implementation of relaxed clock models in Beast allows to co-estimate • the substitution parameters • the clock parameters • the ancestral phylogenies • the demography • … • Relaxed phylogenetics Relaxed Phylogenetics

  7. UNCORRELATED RELAXED CLOCK (UC)Drummond et al 2006 • Hypothesis • The rate of evolution is probably never exactly the same for all evolutionary lineages • Rates follow a given distribution • Prior on rates • Distribution of the rates given by the hyperparameters  and  2 or  Relaxed Phylogenetics

  8. UNCORRELATED RELAXED CLOCK (UC)Drummond et al 2006 t0 r0 t1 r2 r1 t2 time r3 r5 r4 1 2 4 3 • Implementation • Different rates in a tree • But a constant rate per branch • On a given rooted tree of n species • 2n-2 rates • n-1 divergence times • The distribution is discretized • Each branch of the tree is assigned a given rate category • Category mixing : • swapped • drawn (uniform) • random walk 0 2 4 6 8 10 relative rate r Relaxed Phylogenetics

  9. AUTOCORRELATED RELAXED CLOCK (AC) Thorne and Kishino 1998,2001,2002 • Hypothesis • The rate is probably never exactly the same for all evolutionary lineages • For closely related lineages the rates should be similar • Prior on rates • log of the rates follow a Normal distribution • Expectation of a rate r is its ancestor rate rA • Rate at the root node is given by the hyperparameter  • Amount of variation is given by the hyperparameter  2 rA t r Relaxed Phylogenetics

  10. AUTOCORRELATED RELAXED CLOCK (AC) Thorne and Kishino 1998,2001,2002 t0 r0 t1 r2 r1 t2 time r3 r5 r4 1 2 4 3 • Implementation • Different rates in a tree • But a constant rate per branch • On a given rooted tree of n species • 2n-2 rates • n-1 divergence times • Episodic vs Time dependent • Episodic variance =  2 • Time dependent variance = t 2 Relaxed Phylogenetics

  11. GOALS of this TALK • Validation of models implementation • Comparison of models • Fit the data • Deal with calibrations • Estimate of divergence times • Estimate of rates • Reconstruct the tree topology Relaxed Phylogenetics

  12. PHYLOGENETIC ANALYSIS • Dataset 1: Lemurs (Yoder et al 2000) • 36 species (lemurs + mammals outgroup) • alignment of 1812 nucleotides (2 genes) • 7 calibration points • Settings • HKY substitution model + gamma rate heterogeneity • Yule tree prior • 4 independent runs of 20 M steps of MCMC for each setting Relaxed Phylogenetics

  13. PHYLOGENETIC ANALYSIS • Dataset 2: Primates (Peter Waddell) • 7 species of primates: human, chimp, gorilla, orangutan, gibbon, macaque and marmoset • alignment of 1,362,261 nucleotides • Non coding regions • calibration : 16 MYA divergence time of human – orangutan • Settings • GTR substitution model + gamma rate heterogeneity + Invariant • Coalescent or Yule tree prior • 4 independent runs of 50 M steps of MCMC for each setting Relaxed Phylogenetics

  14. PHYLOGENETIC ANALYSIS • Dataset 3: Yeast (Rokas et al 2003) • 8 species of yeast • alignment of 127,026 nucleotides (106 genes) • calibration : Normal prior on the root heightN (1, 0.025) • Settings • GTR substitution model + gamma rate heterogeneity + Invariant • Yule tree prior • 4 independent runs of 50 M steps of MCMC for each setting Relaxed Phylogenetics

  15. PHYLOGENETIC ANALYSIS • Dataset 4: Dengue (Rambaut 2000) • 17 serotype 4 sequences • alignment of 1,485 nucleotides • serial sampling (1956-1994) • Settings • HKY substitution model • Coalescent tree prior • 4 independent runs of 10 M steps of MCMC for each setting Relaxed Phylogenetics

  16. PHYLOGENETIC ANALYSIS • Dataset 5 : Influenza A virus (Drummond et al 2006) • 69 sequences • each sequence represents a consensus of the viral population • alignment of 98 nucleotides • serial sampling (1981-1998) • Settings • HKY substitution model + gamma rate heterogeneity • Coalescent tree prior • Constant population size • 4 independent runs of 20 M steps of MCMC for each setting Relaxed Phylogenetics

  17. MODEL COMPARISON • Bayes Factor (Kass and Raftery 1995, Marc Suchard 2005) • Quantifies the real support of two competing hypothesis given the observed data • Ratio of the marginal likelihood of two models M1 and M2 • Bayesian analogue of the likelihood rate test (LRT) Relaxed Phylogenetics

  18. MARGINAL LOG LIKELIHOOD Relaxed Phylogenetics

  19. Influenza datasetConsensus trees Uncorrelated AutoCorrelated Relaxed Phylogenetics

  20. DIVERGENCE TIMES Relaxed Phylogenetics

  21. DIVERGENCE TIMES Beast: mean of the posterior distributions, error bars are 95% lower and upper HPDs Glazko et al: error bars are +/- standard error Relaxed Phylogenetics

  22. DIVERGENCE TIMES Human Chimp Gorilla Orang Gibbon Macaque Marmoset Uncorrelated Relaxed Clock Autocorrelated Relaxed Clock Relaxed Phylogenetics

  23. RATE OF EVOLUTION Relaxed Phylogenetics

  24. RATE OF EVOLUTION Relaxed Phylogenetics

  25. RATE OF EVOLUTION Relaxed Phylogenetics

  26. RATE OF EVOLUTION Relaxed Phylogenetics

  27. GENES RATE VS SPECIES RATE • Mean rate per “locus” Primates Yeast Relaxed Phylogenetics

  28. NAÏVE MULTIPLE LOCUS APPROACH • Super Matrix • Genes share the same divergence time • Multiple Locus • Perform a relaxed phylogenetic analysis for each “genes” Relaxed Phylogenetics

  29. GENES DIVERGENCE TIMES VS SPECIES DIVERGENCE TIMES Relaxed Phylogenetics

  30. GENES DIVERGENCE TIMES VS SPECIES DIVERGENCE TIMES • Root Height in the primates dataset Relaxed Phylogenetics

  31. GENES RATE VS SPECIES RATE Relaxed Phylogenetics

  32. GENES TREE VS SPECIES TREE Relaxed Phylogenetics

  33. GENES TREE VS SPECIES TREE Relaxed Phylogenetics

  34. Conclusions • Validation of the implementation in Beast • Model comparison • Fit the data • Uncorrelated vs Autocorrelated : prior knowledge • Calibrations • Estimate of rates • Disagree in the multiple locus approach • Reconstruct the tree topology Relaxed Phylogenetics

  35. THANKS Relaxed Phylogenetics

More Related