1 / 22

A Bayesian method for DNA barcoding

A Bayesian method for DNA barcoding. Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen. Assignment to existing species. Identification of new species. Assignment to taxonomic levels in general. Varieties of barcoding. Motivation.

verity
Download Presentation

A Bayesian method for DNA barcoding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen

  2. Assignment to existing species. Identification of new species. Assignment to taxonomic levels in general Varieties of barcoding

  3. Motivation • Environmental aDNA samples. • Putative Neandertal DNA. • Often short query sequences. • Little information. • Permissive PCR conditions. • Not always from the intended locus.

  4. Given a set of database reference sequences from different species – according to which criteria should we assign new query sequences to taxonomic levels? ?

  5. True species assignment • Requires proper population genetic analyses quantifying variablity within species. • Often not possible... • small database sample size for each species. • short query PCR products.

  6. Phylogenetic alternative • Purely phylogenetic criteria which ignore population genetic problems. • Taxonomic annotation of database sequences is used to map phylogenetic groups to taxonomic levels. • The simpler approach has its own advangates: Less data required / Fewer assumptions

  7. Ingroup or outgroup? Monophyletic taxonomic group Query

  8. Estimating trees • Estimation of a single tree is not sufficient because of the uncertainty regarding the phylogeny. • We suggest instead to use a Bayesian approach which quantifies this uncertainty

  9. Bayesian approach • Let Q be the query sequence, X the database data, G a gene tree, and F a desired taxonomic group, then where Giis the ith gene tree sampled from p(G | X).

  10. Assignment pipeline Retrieval of sequences and taxonomy annotation NCBI blast Query Sequence Homology set Database (GenBank) ClustalW Taxonomy summary Sampled trees Alignment Summary Statistics MrBayes

  11. Summary statistics • For each tree: • Find the sister group to the query. • Find the list of taxonomic levels shared by the sequences in the sister group (consensus taxonomy) Sister group Query

  12. Summary statistics • For each tree: • Find the sister group to the query. • Find the list of taxonomic levels shared by the sequences in the sister group (consensus taxonomy) • For each name of each taxonomic level: • Find the fraction of samples trees where the consensus taxonomy include that name.

  13. Example taxonomy summary

  14. Environmental Samples • 379 environmental samples (aDNA) • RBCL and TRNL markers. • Aim is the identification of environmental flora

  15. Orders >90%

  16. Families >90%

  17. Genera >90%

  18. Botanical evaluation Temperate climate similar to central Sweden.

  19. Testing putative Neandertal DNA • Needless to say we have had several negative examples ... • One positive example: • Posterior probability of 91%.

  20. Problems • No population genetic modelling: • Outgroup problem. • Species issues are is not addressed. • Lineage sorting - not reciprocal monophyli. • Incomplete database

  21. Advantages • Phylogenetic uncertainty and statistical uncertainty of assignment is addressed. • Posterior probability of assignment. • Alternative to single tree assignment. • Can be used on any database.

  22. Conclusions • The phylogenetic barcoding does not model the coalescence process. • It is the appropriate method for assignment with little data, or when assigning to higher taxonomic levels. • Bayesian approach offers a measure of confidence in assignment.

More Related