1 / 49

Lab3: Bayesian phylogenetic Inference and MCMC

Lab3: Bayesian phylogenetic Inference and MCMC. Department of Bioinformatics & Biostatistics, SJTU. Topics. Phylogenetics Bayesian inference and MCMC: overview Bayesian model testing MrBayesian tutorial and application Nexus file Configuration of the process How to execute the process

Download Presentation

Lab3: Bayesian phylogenetic Inference and MCMC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU

  2. Topics • Phylogenetics • Bayesian inference and MCMC: overview • Bayesian model testing • MrBayesian tutorial and application • Nexus file • Configuration of the process • How to execute the process • analyzing the results

  3. Phylogenetics • Greek: phylum + genesis • Broad definition: historical term, how the species evolve and fall • Narrow definition: infer relationship of the extant • We prefer the narrow one

  4. Outgroup: Infer relationships among three species:

  5. C A B Three possible trees (topologies):

  6. B C A 1.0 probability Prior distribution Data (observations) 1.0 probability Posterior distribution

  7. What is needed for inference? • A probabilistic model of evolution • Prior distribution on the parameters of the model • Data • A method for calculating the posterior distribution for the model, prior distribution and data

  8. What is needed for inference? • A probabilistic model of evolution • Prior distribution on the parameters of the model • Data • A method for calculating the posterior distribution for the model, prior distribution and data

  9. Parameters A C B D Model: topology+ branch lengths topology branch lengths (expected amount of change)

  10. Parameters Model: molecular evolution instantaneous rate matrix (Jukes-Cantor)

  11. What is needed for inference? • A probabilistic model of evolution • Prior distribution on the parameters of the model • Data • A method for calculating the posterior distribution for the model, prior distribution and data

  12. Priors on parameters • Topology • All unique topologies have equal probabilities • Branch lengths • Exponential prior puts more weight on small branch lengths; appr. uniform on transition probabilities

  13. What is needed for inference? • A probabilistic model of evolution • Prior distribution on the parameters of the model • Data • A method for calculating the posterior distribution for the model, prior distribution and data

  14. Taxon Characters A ACG TTA TTA AAT TGT CCT CTT TTC AGA B ACG TGT TTC GAT CGT CCT CTT TTC AGA C ACG TGT TTA GAC CGA CCT CGG TTA AGG D ACA GGA TTA GAT CGT CCG CTT TTC AGA Data The data (alignment)

  15. What is needed for inference? • A probabilistic model of evolution • Prior distribution on the parameters of the model • Data • A method for calculating the posterior distribution for the model, prior distribution and data

  16. Likelihood function Posterior distribution Prior distribution Normalizing Constant Bayes’ Theorem

  17. Posterior probability distribution Posterior probability tree 3 tree 1 tree 2 Parameter space (high-dimension  1d)

  18. We can focus on any parameter of interest (there are no nuisance parameters) by marginalizing the posterior over the other parameters (integrating outthe uncertainty in the other parameters) 48% 32% 20% tree 3 tree 1 tree 2 (Percentages denote marginal probability distribution on trees)

  19. Marginal probabilities trees joint probabilities branch length vectors marginal probabilities

  20. How to estimate the posterior? • Analytical calculation? Impossible!!! except for very simple examples • Random sampling of parameter space? Impossible too!!!computational infeasible • Dependent sampling using MCMC technique? Yes, you got it! 

  21. Metropolis-Hastings Sampling Assume that the current state has parameter values q Consider a move to a state with parameter values q* according to proposal density q Accept the move with probability (prior ratio x likelihood ratio x proposal ratio)

  22. Sampling Principles • For a complex model, you typically have many “proposal” or “update” mechanisms (“moves”) • Each mechanism changes one or a few parameters • At each step (generation of the chain) one mechanism is chosen randomly according to some predetermined probability distribution • It makes sense to try changing ‘more difficult’ parameters (such as topology in a phylogenetic analysis) more often

  23. Analysis of 85 insect taxa based on 18S rDNA Application example

  24. A C B D topology General Time Reversible (GTR) substitution model branch lengths Model parameters 1

  25. Model parameters 2 Gamma-shaped rate variation across sites

  26. Priors on parameters • Topology • all unique topologies have equal probability • Branch lengths • exponential prior (exp(10) means that expected mean is 0.1 (1/10)) • State Frequencies • Dirichlet prior: Dir(1,1,1,1) • Rates (revmat) • Dirichlet prior: Dir(1,1,1,1,1,1) • Shape of gamma-distribution of rates • Uniform prior: Uni(0,100)

  27. stationary phase sampled with thinning (rapid mixing essential) burn-in

  28. Majority rule consensus tree from sampled trees Frequencies represent the posterior probability of the clades Probability of clade being true given data, model, and prior (and given that the MCMC sample is OK)

  29. Mean and 95% credibility interval for model parameters

  30. MrBayes tutorial Introduction/examples

  31. Nexus format input file • Input: nexus format; accurately, nexus(ish)

  32. Running MrBayes • Use execute to bring data in a Nexus file into MrBayes • Set the model and priors using lset and prset • Run the chain using mcmc • Summarize the parameter samples using sump • Summarize the tree samples using sumt • Note that MrBayes 3.1 runs two independent analyses by default

  33. Convergence Diagnostics • By default performs two independent analyses starting from different random trees (mcmc nruns=2) • Average standard deviation of clade frequencies calculated and presented during the run (mcmc mcmcdiagn=yes diagnfreq=1000) and written to file (.mcmc) • Standard deviation of each clade frequency and potential scale reduction for branch lengths calculated with sumt • Potential scale reduction calculated for all substitution model parameters with sump

  34. Marginal likelihood (of the model) Bayes’ theorem We have implicitly conditioned on a model:

  35. Bayesian Model Choice Posterior model odds: Bayes factor:

  36. Bayesian Model Choice • The normalizing constant in Bayes’ theorem, the marginal probability of the model, f(X) or f(X|M), can be used for model choice • f(X|M) can be estimated by taking the harmonic mean of the likelihood values from the MCMC run (MrBayes will do this automatically with ‘sump’) • Any models can be compared: nested, non-nested, data-derived • No correction for number of parameters • Can prefer a simpler model over a more complex mode

  37. Bayes Factor Comparisons Interpretation of the Bayes factor

More Related