1 / 41

DNA Barcode sequence identification incorporating taxonomic hierarchy and within taxon variability

DNA Barcode sequence identification incorporating taxonomic hierarchy and within taxon variability. Damon P. Little Cullman Program for Molecular Systematics Studies The New York Botanical Garden, Bronx, New York. test data sets (Little and Stevenson 2007).

Download Presentation

DNA Barcode sequence identification incorporating taxonomic hierarchy and within taxon variability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DNA Barcode sequence identification incorporating taxonomic hierarchy and within taxon variability Damon P. Little Cullman Program for Molecular Systematics Studies The New York Botanical Garden, Bronx, New York

  2. test data sets (Little and Stevenson 2007) • gymnosperm nuclear ribosomal internal transcribed spacer 2 (nrITS 2) • 1,037 sequences • 413 species • 71 genera • gymnosperm plastid encoded maturase K (matK) • 522 sequences • 334 species • 75 genera

  3. …alignment

  4. pairwise divergence

  5. measuring precision and accuracy

  6. precision

  7. accuracy to species

  8. lessons learned

  9. “global” alignments do not work

  10. precision

  11. accuracy to species

  12. “fuzzy” matches are not precise

  13. precision

  14. accuracy to species

  15. autoapomorphies (unique characters) work... but not always present

  16. precision

  17. accuracy to species

  18. some sequences are simply unidentifiable

  19. ...remaining (insoluble) problems • identical sequences for multiple terminals • shared alleles between terminals • use allele frequency as a predictor?

  20. desirable methodologies and properties of Sequence IDentification Engines (SIDEs)

  21. Sequence IDentification Engines (SIDEs) • avoid global alignment by comparing short segments: pseudo–alignment • use exact matches • use autoapomorphies where possible • ...but allow the use of other characters too

  22. context/text DNA recoding • characters are defined by flanking context • => pretext and postext • permit “alignment–free” comparisons • size and separation between pretext and postext must be arbitrarily delimited • states (text) limited by the proximity of context • terminals can be individual sequences or composites representing taxa

  23. context/text DNA recoding

  24. context/text DNA recoding • characters are defined by flanking context • => pretext and postext • permit “alignment–free” comparisons • size and separation between pretext and postext is arbitrarily • possible states (text) is limited by the length of the text • terminals can be individual sequences or composites representing taxa

  25. querying text/context database • find pretext/text/postext in the query sequence and match to references

  26. querying text/context database

  27. querying text/context database • find pretext/text/postext in the query sequence and match to references • score terminals based on the number of matches • final score can be raw or based a weighting function

  28. possible weighting functions • equal weights (raw score) • number of distinct texts • => up weights more variable characters • 1/(number of distinct texts) • => down weights more variable characters • (number of texts)/(number of scores)

  29. precision

  30. accuracy to species

  31. BRONX conclusions • BRONX is more precise than existing algorithms • BRONX is sometimes more accurate than existing algorithms • BRONX is an incremental improvement

  32. future directions • improve the scoring function in BRONX • dynamically size context/text • benchmark additional datasets for all methods • incorporate context/text recoding into a scalable version of the ATIM algorithm

  33. acknowledgments • Kenneth Cameron • Santiago Madriñán • Christian Schulz • Dennis Stevenson

More Related