1 / 14

Approximate Bayesian Computation

Approximate Bayesian Computation. Studying demographic parameters. Joao Lopes, Mark Beaumont University of Reading joao.lopes@rdg.ac.uk. ABC algorithm:. Assumptions: Discordance between gene and species trees is not expected Mutation rate is variable in space, but not in time Features:

afra
Download Presentation

Approximate Bayesian Computation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximate Bayesian Computation Studying demographic parameters Joao Lopes, Mark Beaumont University of Reading joao.lopes@rdg.ac.uk

  2. ABC algorithm: • Assumptions: • Discordance between gene and species trees is not expected • Mutation rate is variable in space, but not in time • Features: • Based on construction of gene trees using The Coalescent model • Easily applied to 4 or 5 populations/species • Some tweaks are necessary to use in more populations • But most importantly: • Handles large datasets (typically hundreds of samples per population/species) • Complex population/species models can be used (e.g. presence of gene flow) • Assumptions can be greatly relaxed (e.g. variable mutation rate over time)

  3. Popanc Pop2 Pop1 • ABC algorithm ABC algorithm: F = {Ne1, Ne2, NeA, m1, m2, t} • Sample from prior(s): Fi ~ p(F) • Simulate data, given Fi: Di ~ p(D | Fi) • Summarize Di with set of Summary Statistics obtaining Si; go to 1. until N points (S,F) have been created. • _ • Acceptthe points whose S is within a distance d from s’ the real data summarized by the same set. • _ • Correct the values F according to their distance from the real data by performing a local linear regression NeA t m2 Ne1 m1 Ne2 The population model

  4. Simulated data DNA sequence data (1 locus) Pop1: 45 samples Pop2: 55 samples ABC: 200 data sets Comparison with MCMC: 10 data sets • Summary Statistics used: • mean of pairwise differences • in each population • both populations joined together • number of segregating sites • in each population • both populations joined together • number of haplotypes • in each population • both populations joined together Relative Mean Integrated Square Error (relMISE): , where n is the number of accepted points, fi is the value of a determined parameter for the ith point and f‘ is the true value of the parameter.

  5. “real” data ABC prior distribution MCMC • Simulated data ‘Real’ data and Prior information 10000 20000 5000 0 0 5000 0 12500 0 40000 0 10000 0 0.0005 0 0.0005 0 10000 Ne1 Ne2 NeA m1 m2 t

  6. Simulated data ABC (500 000 iter, tol=0.02, logit transf, sstats=9 ): Simulation 8: Mig1 Mig2 Tev Ne1 Ne2 Neanc average relMISE: (10 data sets)

  7. Simulated data: optimized ABC method ABC (2500 000 iter, tol=0.004, log transf, sstats=9): Simulation 8: Mig1 Mig2 Tev Ne1 Ne2 Neanc average relMISE: (10 data sets)

  8. Simulated data: adding summary stats ABC (2500 000 iter, tol=0.004, log transf, sstats=21) Simulation 8: Mig1 Mig2 Tev Ne1 Ne2 Neanc average relMISE: (10 data sets)

  9. Popanc Popanc Pop2 Pop2 Pop1 Pop1 Model-choice: migration present/absent ABC (1000 000 iter, tol=0.004, log transf, sstats=21): Population model 1 (M = M1) Population model 2 (M = M2) or x pM1 = 2% pM2 = 98% (10 data sets)

  10. Simulated data: using model-choice step ABC (2500 000 iter, tol=0.004, log transf, sstats=21): Simulation 8: Mig1 Mig2 Tev Ne1 Ne2 Neanc average relMISE: (10 data sets)

  11. Simulated data: 10 vs 200 datasets ABC (2500 000 iter, tol=0.004, log transf, sstats=21): Simulation 8: Mig1 Mig2 Tev Ne1 Ne2 Neanc average relMISE: (10 data sets) and (200 data sets)

  12. Conclusions: • Comparison between ABC and MCMC methods: • ABC up to 2 orders of magnitude faster than MCMC method for single locus • ABC modes are similar to MCMC (full likelihood method) • Can easily incorporate more complex population models with relaxed assumptions • Using a model-framework comes just naturally from the ABC approach • Easily handles multi-modal Posterior distributions • Does not have problems associated with Local Maximums in Likelihood distributions • ABC improves with: • parameters transformation • more iterations • more summary statistics • model-choice framework

  13. Take home message: • Phylogenetic methods based on gene trees using The Coalescence are being greatly explored. • These methods will be available in a near by future

  14. Acknowledgements I would like to acknowledge David Balding for providing frequent meetings on the subject. And also a special thanks to Mark Beaumont for advice and comments on the work. Support for this work was provided by EPSRC. joao.lopes@rdg.ac.uk http://www.rdg.ac.uk/~sar05sal

More Related