210 likes | 350 Views
The BARCODE Data Standard . David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution SchindelD@si.edu ; http://www.barcoding.si.edu 202/633-0812; fax 202/633-2938. BARCODE Data Standard is:.
E N D
The BARCODE Data Standard David E. Schindel, Executive Secretary National Museum of Natural History Smithsonian Institution SchindelD@si.edu; http://www.barcoding.si.edu 202/633-0812; fax 202/633-2938
BARCODE Data Standard is: • A set of required elements for a reserved Keyword (‘BARCODE’) in GenBank • A set of sequence quality requirements • Required or recommneded formats for data interoperability with: • Voucher specimens in biorepositories • Georeferenced data • Taxonomic literature
Small ribosomal RNA The Mitochondrial Genome D-Loop mtDNA DNA Cytochrome b ND1 ND6 ND5 COI ND2 COI L-strand H-strand Typical Animal Cell ND4 ND4L COII ND3 COIII ATPase subunit 8 ATPase subunit 6 Mitochondrion An Internal ID System for All Animals
Non-COI regions for other taxa • Land plants: • Chloroplast matK and rbcL approved Nov 09 • 70-75% resolvingability, higher in angiosperms • Non-coding plastid and nuclear regions being explored • Fungi: • CBOL Working Group met this week in Amsterdam • Agreed to recommend ITS; 72% effective • Protists: • CBOL Working Group July meeting, Berlin
BARCODE Record Flow Chart Key Mirroring Update Channel Private Records USER /GenBank
Required Elements for BARCODE • Taxonomic identification to species • Voucher specimen ID in standard format • Name of barcode region • Length, quality, 2 trace files • Forward/reverse primer sequences, names • Country/Ocean/Sea of origin
Highly Recommended Elements • Latitude/longitude • Name of Collector • Collection date • Name of identifier
BARCODE Records in INSDC Specimen Metadata Voucher Specimen Species Name GeoreferenceHabitatCharacter setsImagesBehaviorOther genes Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Barcode Sequence Trace files Primers Literaturecitation Record in BOLD Databases - Provisional sp.
Compliance with Standard (1) • 1.37 million records in BOLD • 514,390 BARCODE records in INSDC • 395,774 have ordinal name plus Barcode Index Number for taxonomic ID • Rapid data release versus time for annotation • Exposure to data theft, risk of misidentification • Added value of Linnean name • Incidence of misidentifications in GenBank • Danger of circular reasoning
Taxonomic Identification • The genus and species combination that can be found in: • a taxonomic index such as Catalog of Life, Zoological Record or IPNI; • a taxonomic treatment of a previously published species name; or • a published description of the species; or • A provisional label for a potential new species;
Rod Page’s ‘Dark Taxa’ R. Page, iPhyloblogspot, 12 April 2011
Taxonomic Content in iBOL Data iBOL ‘Phase 1’ • Org name: Order + BIN • Tentative Name: blank GenBank ‘Phase 0’ • Tentative name is in BOLD, unreleased GenBank ‘Phase 1’ • Org name = Order + BIN plus • Tentative name iBOL ‘Phase 2’ • Org name: Order + BIN • Tentative Name: blank GenBank ‘Phase 2’ • Org name = sp. name
Unique identifier for the voucher specimen In standardized format based on Darwin Core: Institutional acronym:Collectioncode:Specimennumber Institutional acronym:Specimennumber personal:Collectioncode:Specimen number GTI/CBOL/iBOL Workshop, 7 November 2009
Compliance with Standard (2) • 514,390 BARCODE records in INSDC • Traces, primers, length, country, and presence of voucherID checked by GenBank • 99.9% have entry for /specimen_voucher • 13,151 have formatted voucher from 38 institutions • 20 confirmed in biorepositories • 11 unconfirmed • 7 unlisted
Darwin Core TripletStructured Link to Vouchers Collection Code : Catalog ID Institutional Acronym : : : NHM LEP 123456 : : personal DHJanzen SRNP12345
CBOL/GBIF/NCBI Registry of Biorepositories www.biorepositories.org