350 likes | 476 Views
Uncorrelated and Autocorrelated relaxed phylogenetics. Michaël Defoin-Platel and Alexei Drummond. (Bayesian) RELAXED PHYLOGENETICS. t 0. t 1. b 1. b 3. t 2. b 5. time. b 2. b 4. Relaxed Phylogenetics allows
E N D
Uncorrelated and Autocorrelatedrelaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond
(Bayesian) RELAXED PHYLOGENETICS t0 t1 b1 b3 t2 b5 time b2 b4 • Relaxed Phylogenetics allows • the co-estimation of divergence times together with a phylogenetic reconstruction • should be compared with • Unrooted • (2n-3 parameters) • Rooted with a strict clock • (n-1 divergence times) Relaxed Phylogenetics
TIME, SUBSTITUTIONS, and RATES 0 • Time, substitutions and rates • Expected number of substitutions per site on a particular branch i • Substitution rate R(t) cannot be directly observed ! • Only the product of rate and time is identifiable • Without information external to the data, rate and time cannot be separated… time T i Relaxed Phylogenetics
MOLECULAR CLOCK HYPOTHESIS • Molecular Clock Hypothesis (MCH) • (Zuckerlandl and Pauling 1965) • DNA and protein sequences change at a rate that is constant over time • First the substitution rate is estimated then time corresponds to sequence divergence divided by the rate • Estimation of relative rate and relative divergence times • Calibration • Time reference, scaling • Bayesian Phylogenetics : Priors on node height or on tips • Transform relative to absolute rate Relaxed Phylogenetics
MOLECULAR CLOCK HYPOTHESIS • Substitution rate depends on • Natural selection, population size, body mass, generation time, mutation rate, mutation pattern, … • MCH is often violated ! • How to deal with non-clock like data • Keep them ! • Remove them ! • Relax the MCH • Allow the rate of evolution to vary • Make assumptions about the variations Relaxed Phylogenetics
RELAXING THE MCH • Modeling the “Rate of evolution of the rate of evolution” • Sanderson “nonparametric” model • (Random) Local Clock model • Uncorrelated relaxed clock model • Autocorrelated relaxed clock model • Compound Poisson process • Implementation of relaxed clock models in Beast allows to co-estimate • the substitution parameters • the clock parameters • the ancestral phylogenies • the demography • … • Relaxed phylogenetics Relaxed Phylogenetics
UNCORRELATED RELAXED CLOCK (UC)Drummond et al 2006 • Hypothesis • The rate of evolution is probably never exactly the same for all evolutionary lineages • Rates follow a given distribution • Prior on rates • Distribution of the rates given by the hyperparameters and 2 or Relaxed Phylogenetics
UNCORRELATED RELAXED CLOCK (UC)Drummond et al 2006 t0 r0 t1 r2 r1 t2 time r3 r5 r4 1 2 4 3 • Implementation • Different rates in a tree • But a constant rate per branch • On a given rooted tree of n species • 2n-2 rates • n-1 divergence times • The distribution is discretized • Each branch of the tree is assigned a given rate category • Category mixing : • swapped • drawn (uniform) • random walk 0 2 4 6 8 10 relative rate r Relaxed Phylogenetics
AUTOCORRELATED RELAXED CLOCK (AC) Thorne and Kishino 1998,2001,2002 • Hypothesis • The rate is probably never exactly the same for all evolutionary lineages • For closely related lineages the rates should be similar • Prior on rates • log of the rates follow a Normal distribution • Expectation of a rate r is its ancestor rate rA • Rate at the root node is given by the hyperparameter • Amount of variation is given by the hyperparameter 2 rA t r Relaxed Phylogenetics
AUTOCORRELATED RELAXED CLOCK (AC) Thorne and Kishino 1998,2001,2002 t0 r0 t1 r2 r1 t2 time r3 r5 r4 1 2 4 3 • Implementation • Different rates in a tree • But a constant rate per branch • On a given rooted tree of n species • 2n-2 rates • n-1 divergence times • Episodic vs Time dependent • Episodic variance = 2 • Time dependent variance = t 2 Relaxed Phylogenetics
GOALS of this TALK • Validation of models implementation • Comparison of models • Fit the data • Deal with calibrations • Estimate of divergence times • Estimate of rates • Reconstruct the tree topology Relaxed Phylogenetics
PHYLOGENETIC ANALYSIS • Dataset 1: Lemurs (Yoder et al 2000) • 36 species (lemurs + mammals outgroup) • alignment of 1812 nucleotides (2 genes) • 7 calibration points • Settings • HKY substitution model + gamma rate heterogeneity • Yule tree prior • 4 independent runs of 20 M steps of MCMC for each setting Relaxed Phylogenetics
PHYLOGENETIC ANALYSIS • Dataset 2: Primates (Peter Waddell) • 7 species of primates: human, chimp, gorilla, orangutan, gibbon, macaque and marmoset • alignment of 1,362,261 nucleotides • Non coding regions • calibration : 16 MYA divergence time of human – orangutan • Settings • GTR substitution model + gamma rate heterogeneity + Invariant • Coalescent or Yule tree prior • 4 independent runs of 50 M steps of MCMC for each setting Relaxed Phylogenetics
PHYLOGENETIC ANALYSIS • Dataset 3: Yeast (Rokas et al 2003) • 8 species of yeast • alignment of 127,026 nucleotides (106 genes) • calibration : Normal prior on the root heightN (1, 0.025) • Settings • GTR substitution model + gamma rate heterogeneity + Invariant • Yule tree prior • 4 independent runs of 50 M steps of MCMC for each setting Relaxed Phylogenetics
PHYLOGENETIC ANALYSIS • Dataset 4: Dengue (Rambaut 2000) • 17 serotype 4 sequences • alignment of 1,485 nucleotides • serial sampling (1956-1994) • Settings • HKY substitution model • Coalescent tree prior • 4 independent runs of 10 M steps of MCMC for each setting Relaxed Phylogenetics
PHYLOGENETIC ANALYSIS • Dataset 5 : Influenza A virus (Drummond et al 2006) • 69 sequences • each sequence represents a consensus of the viral population • alignment of 98 nucleotides • serial sampling (1981-1998) • Settings • HKY substitution model + gamma rate heterogeneity • Coalescent tree prior • Constant population size • 4 independent runs of 20 M steps of MCMC for each setting Relaxed Phylogenetics
MODEL COMPARISON • Bayes Factor (Kass and Raftery 1995, Marc Suchard 2005) • Quantifies the real support of two competing hypothesis given the observed data • Ratio of the marginal likelihood of two models M1 and M2 • Bayesian analogue of the likelihood rate test (LRT) Relaxed Phylogenetics
MARGINAL LOG LIKELIHOOD Relaxed Phylogenetics
Influenza datasetConsensus trees Uncorrelated AutoCorrelated Relaxed Phylogenetics
DIVERGENCE TIMES Relaxed Phylogenetics
DIVERGENCE TIMES Beast: mean of the posterior distributions, error bars are 95% lower and upper HPDs Glazko et al: error bars are +/- standard error Relaxed Phylogenetics
DIVERGENCE TIMES Human Chimp Gorilla Orang Gibbon Macaque Marmoset Uncorrelated Relaxed Clock Autocorrelated Relaxed Clock Relaxed Phylogenetics
RATE OF EVOLUTION Relaxed Phylogenetics
RATE OF EVOLUTION Relaxed Phylogenetics
RATE OF EVOLUTION Relaxed Phylogenetics
RATE OF EVOLUTION Relaxed Phylogenetics
GENES RATE VS SPECIES RATE • Mean rate per “locus” Primates Yeast Relaxed Phylogenetics
NAÏVE MULTIPLE LOCUS APPROACH • Super Matrix • Genes share the same divergence time • Multiple Locus • Perform a relaxed phylogenetic analysis for each “genes” Relaxed Phylogenetics
GENES DIVERGENCE TIMES VS SPECIES DIVERGENCE TIMES Relaxed Phylogenetics
GENES DIVERGENCE TIMES VS SPECIES DIVERGENCE TIMES • Root Height in the primates dataset Relaxed Phylogenetics
GENES RATE VS SPECIES RATE Relaxed Phylogenetics
GENES TREE VS SPECIES TREE Relaxed Phylogenetics
GENES TREE VS SPECIES TREE Relaxed Phylogenetics
Conclusions • Validation of the implementation in Beast • Model comparison • Fit the data • Uncorrelated vs Autocorrelated : prior knowledge • Calibrations • Estimate of rates • Disagree in the multiple locus approach • Reconstruct the tree topology Relaxed Phylogenetics
THANKS Relaxed Phylogenetics