400 likes | 1.11k Views
Molecular Clocks. Prediction of time from molecular divergence. Outline. What is the molecular clock hypothesis? How do you detect deviations of the molecular clock hypothesis? Assuming a perfect molecular clock, what are the potential pitfalls in using it for dating?
E N D
Molecular Clocks Prediction of time from molecular divergence
Outline • What is the molecular clock hypothesis? • How do you detect deviations of the molecular clock hypothesis? • Assuming a perfect molecular clock, what are the potential pitfalls in using it for dating? • Dating with “relaxed” clocks • Cautionary notes
Molecular Clock • Molecular divergence is ROUGHLY correlated with divergence of time
Evidence for Rate Constancyin Hemoglobin from Zuckerkandl and Pauling (1965)
H C M R D 110 MYA • Given • a phylogenetic tree • branch lengths • a time estimate for one (or more) node(s) • Can we date other nodes in the tree? • Yes... if the rate of molecular change is constant across all branches
The Molecular Clock Hypothesis • Amount of genetic difference between sequences is a function of time since separation • Rate of molecular change is constant (enough) to predict times of divergence (within the bounds of particular genes and taxa)
Rate Constancy? Page & Holmes p240
Rate Heterogeneity • Rate of molecular evolution can differ between • nucleotide positions • genes • genomic regions • genomes within species (nuclear vs organelle) • species • over time • If not considered, introduces bias into time estimates
Local Clocks? • Closely related species often share similar properties, likely to have similar rates • For example • murid rodents on average 2-6 times faster than apes and humans (Graur & Li p150) • mouse and rat rates are nearly equal (Graur & Li p146)
Identifying rate heterogeneity Tests of molecular clock: • Likelihood ratio test • identifies deviance from clock but not the deviant sequences • Relative rates tests • compares rates of sister nodes using an outgroup • Tajima test • Number of sites in which character shared by outgroup and only one of two ingroups should be equal for both ingroups • Branch length test • deviation of distance from root to leaf compared to average distance
Likelihood Ratio Test • estimate a phylogeny under molecular clock and without it • e.g. root-to-tip distances must be equal • difference in likelihood ~ 2*Chi^2 with n-2 degrees of freedom (n = # taxa in tree) • asymptotically • when models are nested
Relative Rates TestsSarich & Wilson 1973, Wu and Li 1985 • Tests whether distance between two taxa and an outgroup are equal (or average rate of two clades vs an outgroup) • need to compute expected variance • many triples to consider, and not independent (although modifications such as Li & Bousquet 1992 correct for this) • Lacks power, esp • short sequences • low rates of change • Given length and number of variable sites in typical sequences used for dating, (Bromham et al 2000) says: • unlikely to detect moderate variation between lineages (1.5-4x) • likely to result in substantial error in date estimates
Relative Rates TestsSarich & Wilson 1973, Wu and Li 1985 Taxon 1 Taxon 1 0 Taxon 2 Taxon 2 Taxon 3 Outgroup Taxon 3 Outgroup
Relative Rates TestsSarich & Wilson 1973, Wu and Li 1985 H0: K01 = K02 or K01 - K02 = 0 K13 = K01 + K03 (1) K23 = K02 + K03 (2) K12 = K01 + K02 (3) K01 = (K13 + K12 – K23 )/2 (4) K02 = (K12 + K23 – K13 )/2 (5) K03 = (K13 + K23 – K12 )/2 (6) K01 – K02 = K13 - K23 Variancez = K13 - K23 \ [var (K13 - K23)] 1/2 Compare to normal distribution K01 Taxon 1 0 K02 Taxon 2 K03 Taxon 3 Outgroup
Bayesian Relative Rates test (Wilcox et al. 2004) • MrBayes in conjunction with Cadence; variation is estimated from the posterior distribution • Cadence summarizes for all tree samples, the distance between specific taxa and the most recent common ancestor (MRCA)
Measuring Evolutionary time with a molecular clock • Estimate genetic distance d = number amino acid replacements • Use paleontological data to determine date of common ancestor T = time since divergence • Estimate calibration rate (number of genetic changes expected per unit time) r = d / 2T • Calculate time of divergence for novel sequences Tij = dij / 2r
Perfect Molecular Clock • Change linear function time (substitutions ~ Poisson) (variation is only due to stochastic error) • Rates constant (positions/lineages) • Tree perfect • Molecular distance estimated perfectly • Calibration dates without error • Regression (time vs substitutions) without error
Poisson Variance(Assuming A Perfect Molecular Clock) If mutation every MY • Poisson variance • 95% lineages 15 MYA old have 8-22 substitutions • 8 substitutions also could be 5 MYA Molecular Systematics p532
Estimating Substitution Rate • Calculate separate rate for each data set (species/genes) using known date of divergence (from fossil, biogeography) • One calibration point • Rate = d/2T • More than one calibration point • use regression
Calibration Complexities • Cannot date fossils perfectly • Fossils usually not direct ancestors • branched off tree before (after?) splitting event. • Impossible to pinpoint the age of last common ancestor of a group of living species
Linear Regression • Fix intercept at (0,0) • Fit line between divergence estimates and calibration times • Calculate regression and prediction confidence limits • A = regression line • B1-B2 = 95% CI of regression line • C1-C2 = 95% CI for predicted time values Molecular Systematics p536
Molecular DatingSources of Error (assuming constant rates) • Both X and Y values only estimates • substitution model could be incorrect • tree could be incorrect • errors in orthology assignment • Poisson variance is large • Pairwise divergences correlated (Molec Systematics p534) • inflates correlation between divergence & time • Sometimes calibrations correlated • if using derived calibration points • Error in inferring slope • Confidence interval for predictions much larger than confidence interval for slope
Working Around Rate Heterogeneity • Identify lineages that deviate and remove them • Quantify degree of rate variation to put limits on possible divergence dates • requires several calibration dates, not always available • gives very conservative estimates of molecular dates • Explicitly model rate variation (relaxed clocks)
Relaxing the Molecular ClockRutschmann 2006 (review) • Likelihood analysis • Assign each branch a rate parameter • explosion of parameters, not realistic • User can partition branches based on domain knowledge • Rates of partitions are independent • Nonparametric methods smooth rates along tree and penalized likelihood (program r8s) • Bayesian approach • stochastic model of evolutionary change • prior distribution of rates: • Autocorrelation: BEAST and Multidivtime • Non-autocorrelation: BEAST (can also incorporate uncertainty in topology)
Multiple Gene Loci • “Trying to estimate time of divergence from one protein is like trying to estimate the average height of humans by measuring one human” --Molecular Systematics p539 • Ideally: • use multiple genes • use multiple calibration points
Even so, be Very cautious about divergence time inferences • Point estimates are absurd • Sample errors often based only on the difference between estimates in the same study • Even estimates with confidence intervals unlikely to really capture all sources of variance
General References Reviews/Critiques • Bromham and Penny. The modern molecular clock, Nature Genetics, 2003. • Graur and Martin. Reading the entrails of chickens...the illusion of precision. Trends in Genetics, 2004. • Rutschmann.2006 Molecular dating of phylogenetic trees: A brief review of current methods that estimate divergence times. Diversity and Distributions Textbooks: • Molecular Systematics. 2nd edition. Edited by Hillis, Moritz, and Mable. • Inferring Phylogenies. Felsenstein. • Molecular Evolution, a phylogenetic approach. Page and Holmes. • Chapter 11 textbook “The Phylogenetic Handbook”
Rate Heterogeneity References Dealing with Rate Heterogeneity • Yang and Yoder. Comparison of likelihood and bayesian methods for estimating divergence times. Syst. Biol, 2003. • Kishino, Thorne, and Bruno. Performance of a divergence time estimation method under a probabilistic model of rate evolution. Mol. Biol. Evol, 2001. • Huelsenbeck, Larget, and Swofford. A compound poisson process for relaxing the molecular clock. Genetics, 2000. Testing for Rate heterogeneity • Takezaki, Rzhetsky and Nei. Phylogenetic test of the molecular clock and linearized trees. Mol. Bio. Evol., 1995. • Bromham, Penny, Rambaut, and Hendy. The power of relative rates test depends on the data. J Mol Evol, 2000. • Wilcox, T. P., F. J. Garcia de Leon, D. A. Hendrickson, and D. M. Hillis. 2004. Convergence among cave catfishes: long-branch attraction and a Bayesian relative rates test. Mol. Phylogenet. Evol. 31:1101-1113.