1 / 64

Juan Daza UCF Fall 2008

Juan Daza UCF Fall 2008. Reconstructing the evolutionary process. Geography + Paleontology + Evolutionary theory. Reconstructing the evolutionary process. Geography + Paleontology + Evolutionary theory + Molecular evolution. Evolutionary process implies TIME.

novia
Download Presentation

Juan Daza UCF Fall 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Juan Daza UCF Fall 2008

  2. Reconstructing the evolutionary process Geography + Paleontology + Evolutionary theory

  3. Reconstructing the evolutionary process Geography + Paleontology + Evolutionary theory + Molecular evolution

  4. Evolutionary process implies TIME We are interested in determine How, Where, Why, WHEN evolution occurs or has occurred Genetic data Molecular evolution theory Molecular dating

  5. The general procedure of molecular dating Phylogram Ultrametric tree

  6. The evolution of molecular dating Hemoglobin example Fitch’s test The term is introduced Statistical properties of clocks Neutral theory

  7. The evolution of molecular dating Autocorrelation of rates branch pruning Penalized likelihood Local clocks NPRS Bayesian Uncorrelated rates

  8. The evolution of molecular dating

  9. The evolution of molecular dating • Amino acids • Nucleotides • Pruning branches • Local clocks (PAML, Pathd8 packages) • Relaxed clocks • Correlated rates (r8s, Multidivtime) • Uncorrelated rates (Beast)

  10. Applications • Species divergence • Explosive radiations • Gene evolution • Rates estimation • Virus epidemiology • Historical demography bursts Log (# lineages) Time

  11. The molecular clock hypothesis The hypothesis of the molecular clock proposes that molecular evolution occurs at rates that persist through time and across lineages “The discovery of the molecular clock stands out as the most significant result of research in molecular evolution.” Wilson et al., 1977 Burst Constant

  12. Emile Zuckerland and Linus Pauling “…It is possible to evaluate very roughly and tentatively the time that has elapsed since any of the hemoglobin chains present in a given species and controlled by non-allelic genes diverged from a common chain ancestor. . . . From paleontological evidence it may be estimated that the common ancestor of man and horse lived in the Cretaceous or possibly the Jurassic period, say between 100 and 160 million years ago. . . . The presence of 18 differences between human and horse -chains would indicate that each chain had 9 evolutionary effective mutations in 100 to 160 millions of years. This yields a figure of 11 to 18 million years per amino acid substitution in a chain of about 150 amino acids, with a medium [sic] figure of 14.5 million years…” Zuckerland and Pauling, 1962 Burst Constant

  13. Emile Zuckerland and Linus Pauling Burst Constant

  14. The molecular clock hypothesis number of substitutions per site Divergence time between species i and j rate = number of substitutions per site per year Confidence interval Burst Constant

  15. The molecular clock hypothesis Increasing of genetic data Quantification of rates Molecular evolution understanding Framework for hypothesis testing Constant

  16. The molecular clock hypothesis Some biological attributes might be responsible: • Differences in generation times • Differences in population size • Natural selection and its intensity Constant

  17. Log Likelihood ratio test Null hypothesis: the phylogeny is rooted and the branch lengths are constrained such that all of the tips can be drawn at a single time plane. Alternative hypothesis: each branch is allowed to vary independently.   Chi-square distribution with 3 d.f.

  18. What to do if the clock is rejected? Branch lengths Amount of evolution BL = R*T

  19. What to do if the clock is rejected? Phylogram Ultrametric tree Error in topology Error in branch lengths Error in rates optimization Error in calibration

  20. What to do if the clock is rejected? …Go simple Eliminate branches (lineages) that are causing the clock to be rejected

  21. What to do if the clock is rejected? Statistical modeling Objective functions need to be developed to reduce dimensionality

  22. Global clock to Local clocks Assign specific rates to specific parts of the tree and calculate divergence times Packages: PAML Pathd8 r2 r1

  23. …what if still doesn’t work? We need to find the function that explain the data better. “Relaxed clock methods” Maximum Likelihood and Bayesian Inference Correlated relaxed clocks Uncorrelated relaxed clocks

  24. Penalized Likelihood Method (Sanderson, 2002) • A likelihood method to generate an ultrametric chronogram from a non-ultrametric tree • Finds the best fitting model of rate evolution considering both: how well modeled changes explain the branch lengths The amount of rate changes across the tree (less change = better) Rates correlation

  25. Penalized Likelihood Method (Sanderson, 2002) • A topology with branch lengths is required. • Absolute or relative dates can be obtained. • Bootstrap method is used for confidence intervals (time consuming!!!) • Fossil cross validation

  26. Penalized Likelihood Method (Sanderson, 2002) Maximizes the sequence data (X) on a combination of average rates (R) and time (T) with a penalty function to discourage rate change. Penalty function Likelihood

  27. Confidence intervals for Penalized Likelihood (Burbrink and Pyron, 2008) Estimate of time for a single node from single bootstrap pseudoreplicate Mean date for the same node from all bootstrap pseudoreplicates Number of pseudoreplicates Standard error of a bootstrap distribution

  28. p ( X B ) p ( R T , v ) p ( T C ) p ( v ) ages tree parameters = p ( T , R , v X , C ) p ( X C ) Likelihood Prior Posterior marginal p of the data constraints Bayesian Inference (Thorne and Kishino, 2000; Drummond et al., 2006) Uses the bayes’ rule to estimate rates and dates

  29. Bayesian Inference (Thorne and Kishino, 2000; Drummond et al., 2006) BL=R*T BL=0.065 subs/site

  30. Bayesian Inference (Thorne and Kishino, 2000; Drummond et al., 2006) r=0.1 t=0.65

  31. Bayesian Inference (Thorne and Kishino, 2000; Drummond et al., 2006) BL=0.065 subs/site

  32. Bayesian Inference (Thorne and Kishino, 2000; Drummond et al., 2006) BL=0.065 subs/site Prior

  33. Bayesian Inference (Thorne and Kishino, 2000; Drummond et al., 2006) BL=0.065 subs/site Prior Posterior

  34. Thorne and Kishino, 1998 • A topology is required. • Branch lengths are estimated using the F84 model • Variance-covariance matrix of the branch lengths are also estimated • Several priors (e.g., time constraints, rates) can be included • MCMC methods are implemented to sample from the posterior BL=0.065 subs/site

  35. Drummond et al., 2006 • A topology is not required. Phylogeny and dates are estimated simultaneously. • More complex models can be applied. • Several priors (e.g., time constraints, rates) can be included. Distributions do not need to be normal. • MCMC methods are implemented to sample from the posterior BL=0.065 subs/site

  36. Coalescent theory and molecular dating Coalescent A stochastic process that describes how population genetic processes determine the shape of the genealogy of sampled gene sequences . + Molecular dating Test hypotheses about historical demography

  37. Coalescent theory and molecular dating Coalescent A stochastic process that describes how population genetic processes determine the shape of the genealogy of sampled gene sequences . + Molecular dating E O Test hypotheses about historical demography

  38. Coalescent theory and molecular dating Coalescent A stochastic process that describes how population genetic processes determine the shape of the genealogy of sampled gene sequences . Bison + Molecular dating Test hypotheses about historical demography HCV

  39. The methods seems to be more “realistic” but… Are they more accurate in the real world? How do we know if a method is appropiate??

  40. There are many factors that can affect divergence times • Uncertainty of phylogenetic relationships. • Rates of evolution are unknown for many organisms. • Rate heterogeneity  no molecular clock. • Lack of calibration points (fossils, biogeographic events). BL = R*T

  41. Gene tree vs. species tree Y  TMRCA Time of cladogenetic event = ≠ Divergence times Coalescent times

  42. Crotalinae 0.84 0.54 0.66 0.91 0.74 New World

  43. Calibration Error includes several components: • Fossil misidentified (belongs elsewhere and calibrates a different node) • Fossil mis-dated (uncertainty in determining absolute age of fossil) • Non-preservation (fossil never gives true origin - impossible to avoid)

  44. Fossil cross-validation (Near et al., 2005) Test the effect of each fossil on the time estimates We left one fossil and re-estimated dates of remaining fossils usingr8s Consistent Inconsistent

  45. Parameters: Average difference between molecular ages and fossil ages Fossils inconsistency Sumsquares of differences Effect of removing inconsistent fossils Standard deviation

  46. Fossil 1 s Inconsistency Number of fossils removed Overestimation Best ? underestimation 1 2 3 4 5 Fossil calibration

  47. Use of all fossils Different values of  (parameter that relaxes the molecular clock using Penalized Likelihood). 0.01 0.1 1 10 100 1000 10000 Estimation of divergence time using r8s

  48. Clock behavior Substitution rate ratio Cross-validation score Log () Log ()

  49. 5different outgroups depending of its distance to the ingroup (number of internal branches) outgroup 3 outgroup 2 outgroup 1 Optimization of branch lenghts using likelihood and theGTR++Imodel Estimation of divergence time using the Mean Path Length method Pathd8 ingroup

  50. Estimated age Node

More Related