360 likes | 559 Views
lnformatics Workshop, Adelaide 28 November 2011. The BARCODE Data Standard : Enabling Molecular Diagnostics for Biodivesity. Robert Hanner, Ph.D. Centre for Biodiversity Genomics University of Guelph, Canada. The Infrastructure of Taxonomy.
E N D
lnformatics Workshop, Adelaide 28 November 2011 The BARCODE Data Standard: Enabling Molecular Diagnostics for Biodivesity Robert Hanner, Ph.D. Centre for Biodiversity Genomics University of Guelph, Canada
The Infrastructure of Taxonomy • Collections and databases of specimens • Codes of Taxonomic Nomenclature • Compilations of taxonomic names • Monographs • Floristic and faunistic surveys/inventories • Revisions • The (undigitized) Taxonomic Literature
New tools for taxonomy DNA Barcoding The ability to compare genotype information across a huge range of organisms is a powerful tool
Couplets Consisting of:“Species Name - DNA Sequence” Basis of a “look-up table” enabling molecular diagnostic applications However, both elements are assertions Underlying specimens and associated raw sequence data are not typically available for secondary inspection
Manual Assembly Subjective interpretation?
“Only [27%] of papers had a legitimate specimens examined section, with museum numbers for each voucher, and names of the museums where the specimens used in the study could be examined”
Problem Areas TRANSPARENCY AND TRACEABILITY • Genetic Data Quality • Specimen Data Quality • Taxonomy • Information Access
First International Barcode of Life Conference
Barcoding: Integrating Best Practices
Data Standards for BARCODE Records in INSDC* • Community-based standards for COI • Creation of a reserved keyword BARCODE - Required & recommended data elements - Sequence quality and coverage • Recommended for identifying unknowns • Process to propose non-COI gene regions *http://barcoding.si.edu/pdf/dwg_data_standards-final.pdf
Second International Barcode of Life Conference 17-21 Sept 2007
Validation demonstrates that a procedure is robust, reliable and reproducible. PCR amplification and DNA sequencing: • Are robust methods which produces successful results a high percentage of the time. • Are reliable methods that produce accurate results. • Are reproducible methods producing similar results each time a sample is tested.
2009: Barcode Markers for Plants 52 authors from 24 institutions in 9 nations, proposed a pair of short sequences (totaling about 1,450 base pairs) from rbcLand matKas the foundation for a DNA barcode library for plants. CBOL Plant Working Group (2009) A DNA barcode for land plants. ProcNatlAcadSci USA 106:12794–12797.
2011: Barcode Marker for Fungi 149 authors from 71 institutions propose ITS as fungal barcode target.It also has demonstrated utility in some plants*. Fungal Barcoding Consortium (2011) The nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. ProcNatlAcadSci USA (Submitted). *Hollingsworth (2011) Refining the DNA barcode for land plants. www.pnas.org/cgi/doi/10.1073/pnas.1116812108
Move toward rapid data release: • In 2009 the community acknowledged the value of the “Ft Lauderdale Accord” • Raw sequence data and high-level taxonomy (eg order) deposited in INSDC prior to publication • Gave rise to “Dark taxa” in INSDC and subsequent arguments pro & con
Issues that need to be addressed: • Legacy BARCODE records lack trace files • Many recent BARCODE records lack valid names • Not all potential BARCODE data is in the public domain
Question: What is barcoding? • A method for species identification and discovery through the analysis of short, standardized DNA sequences • Should BARCODE be applied only to known species as an ID tag, or should it be used to designate a sequence entry conforming to a meta-data standard?
DNA Barcodes: a tool of integrative taxonomy Barcoding DNA Identification DNA Taxonomy Low ambiguity Species well-known High ambiguity Species unknown
Evolution of Standards Even among well-studied vertebrates: • serious discrepancies exist in the application of names across labs • Identification accuracy of reference collections highly variable • Perhaps BARCODE is a better process tag unless reserved for published data
2011: BOLD 3.0 • Supports assembly of BARCODE compliant data records for all markers • Includes specimen images and introduces BINs to aid data validation • Introduces features for 3rd party annotation of data records to facilitate library curation
What other issues remain? • Barcode annotation of plants and fungi? • Registration of institutions/collections • Synchronization of data bases
Accomplishments: • Integration of genomics and biodiversity science via creation of a robust molecular diagnostic interface between them • Increased community awareness of taxonomy and collections
Acknowledgments: • All Participants of the CBOL Database Work Group and many, many others!
Rationale for Defining “BARCODE” keyword in GenBank • Provides the community with reference records with verifiable and retrievable data: • Associated with retrievable voucher specimens (liberally defined: tissue, DNA, etc.) • Linked to on-line metadata • Meet an agreed upon standard of taxonomic identification • Provide an assured level of data completeness • On an agreed upon gene region • Recommended for use in identifying unknowns
The Barcode Data Standard • Establishing a new data standard for “BARCODE”keyword records in DDBJ/EMBL/GenBank: • Minimum 500bp, <1% ambiguous base calls • Double stranded sequence • Trace files and associated quality scores • Primers used to generate sequence • Linkages to: • A morphological voucher specimen • Structured reference to collections • Geospatial reference information • Valid species name • Who performed the identification • Literature citations