1 / 36

Robert Hanner, Ph.D. Centre for Biodiversity Genomics University of Guelph, Canada

lnformatics Workshop, Adelaide 28 November 2011. The BARCODE Data Standard : Enabling Molecular Diagnostics for Biodivesity. Robert Hanner, Ph.D. Centre for Biodiversity Genomics University of Guelph, Canada. The Infrastructure of Taxonomy.

dana
Download Presentation

Robert Hanner, Ph.D. Centre for Biodiversity Genomics University of Guelph, Canada

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. lnformatics Workshop, Adelaide 28 November 2011 The BARCODE Data Standard: Enabling Molecular Diagnostics for Biodivesity Robert Hanner, Ph.D. Centre for Biodiversity Genomics University of Guelph, Canada

  2. The Infrastructure of Taxonomy • Collections and databases of specimens • Codes of Taxonomic Nomenclature • Compilations of taxonomic names • Monographs • Floristic and faunistic surveys/inventories • Revisions • The (undigitized) Taxonomic Literature

  3. New tools for taxonomy DNA Barcoding The ability to compare genotype information across a huge range of organisms is a powerful tool

  4. Emerging Applications

  5. Couplets Consisting of:“Species Name - DNA Sequence” Basis of a “look-up table” enabling molecular diagnostic applications However, both elements are assertions Underlying specimens and associated raw sequence data are not typically available for secondary inspection

  6. Manual Assembly Subjective interpretation?

  7. “Only [27%] of papers had a legitimate specimens examined section, with museum numbers for each voucher, and names of the museums where the specimens used in the study could be examined”

  8. Problem Areas TRANSPARENCY AND TRACEABILITY • Genetic Data Quality • Specimen Data Quality • Taxonomy • Information Access

  9. First International Barcode of Life Conference

  10. Barcoders began calling for a Paradigm Shift

  11. Barcoding: Integrating Best Practices

  12. Data Standards for BARCODE Records in INSDC* • Community-based standards for COI • Creation of a reserved keyword BARCODE - Required & recommended data elements - Sequence quality and coverage • Recommended for identifying unknowns • Process to propose non-COI gene regions *http://barcoding.si.edu/pdf/dwg_data_standards-final.pdf

  13. Second International Barcode of Life Conference 17-21 Sept 2007

  14. Validation demonstrates that a procedure is robust, reliable and reproducible. PCR amplification and DNA sequencing: • Are robust methods which produces successful results a high percentage of the time. • Are reliable methods that produce accurate results. • Are reproducible methods producing similar results each time a sample is tested.

  15. Third International Barcode of Life Conference

  16. 2009: Barcode Markers for Plants 52 authors from 24 institutions in 9 nations, proposed a pair of short sequences (totaling about 1,450 base pairs) from rbcLand matKas the foundation for a DNA barcode library for plants. CBOL Plant Working Group (2009) A DNA barcode for land plants. ProcNatlAcadSci USA 106:12794–12797.

  17. Fourth International Barcode of Life Conference

  18. 2011: Barcode Marker for Fungi 149 authors from 71 institutions propose ITS as fungal barcode target.It also has demonstrated utility in some plants*. Fungal Barcoding Consortium (2011) The nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. ProcNatlAcadSci USA (Submitted). *Hollingsworth (2011) Refining the DNA barcode for land plants. www.pnas.org/cgi/doi/10.1073/pnas.1116812108

  19. Move toward rapid data release: • In 2009 the community acknowledged the value of the “Ft Lauderdale Accord” • Raw sequence data and high-level taxonomy (eg order) deposited in INSDC prior to publication • Gave rise to “Dark taxa” in INSDC and subsequent arguments pro & con

  20. Issues that need to be addressed: • Legacy BARCODE records lack trace files • Many recent BARCODE records lack valid names • Not all potential BARCODE data is in the public domain

  21. Question: What is barcoding? • A method for species identification and discovery through the analysis of short, standardized DNA sequences • Should BARCODE be applied only to known species as an ID tag, or should it be used to designate a sequence entry conforming to a meta-data standard?

  22. DNA Barcodes: a tool of integrative taxonomy Barcoding DNA Identification DNA Taxonomy Low ambiguity Species well-known High ambiguity Species unknown

  23. Evolution of Standards Even among well-studied vertebrates: • serious discrepancies exist in the application of names across labs • Identification accuracy of reference collections highly variable • Perhaps BARCODE is a better process tag unless reserved for published data

  24. 2011: BOLD 3.0 • Supports assembly of BARCODE compliant data records for all markers • Includes specimen images and introduces BINs to aid data validation • Introduces features for 3rd party annotation of data records to facilitate library curation

  25. What other issues remain? • Barcode annotation of plants and fungi? • Registration of institutions/collections • Synchronization of data bases

  26. www.biorepositories.org

  27. Structured Reference to Vouchers?

  28. LinkOut to Collection Catalogs

  29. Accomplishments: • Integration of genomics and biodiversity science via creation of a robust molecular diagnostic interface between them • Increased community awareness of taxonomy and collections

  30. Acknowledgments: • All Participants of the CBOL Database Work Group and many, many others!

  31. Rationale for Defining “BARCODE” keyword in GenBank • Provides the community with reference records with verifiable and retrievable data: • Associated with retrievable voucher specimens (liberally defined: tissue, DNA, etc.) • Linked to on-line metadata • Meet an agreed upon standard of taxonomic identification • Provide an assured level of data completeness • On an agreed upon gene region • Recommended for use in identifying unknowns

  32. The Barcode Data Standard • Establishing a new data standard for “BARCODE”keyword records in DDBJ/EMBL/GenBank: • Minimum 500bp, <1% ambiguous base calls • Double stranded sequence • Trace files and associated quality scores • Primers used to generate sequence • Linkages to: • A morphological voucher specimen • Structured reference to collections • Geospatial reference information • Valid species name • Who performed the identification • Literature citations

  33. BARCODE Records (without trace files)

More Related