440 likes | 579 Views
Connect.barcodeoflife.org. Barcode of Life Community. Networks, Projects, Organizations. Promote barcoding as a global standard Build participation Working Groups BARCODE standard International Conferences Increase production of public BARCODE records.
E N D
Barcode of Life Community Networks, Projects, Organizations • Promote barcoding as a global standard • Build participation • Working Groups • BARCODE standard • International Conferences • Increase production of public BARCODE records
Principles and Goals • Free and open access • Standardization and scalability • Specimen-centered • Rapid data release following primary QA/QC • Ongoing crowd-sourced data curation • Enable accelerated modern taxonomy • Navigate across data types (DNA, specimens, species, publications, georeferences) • Locate, aggregate, display and analyze data, resources
How Barcoding Works • Building the reference library: • Well-identified specimen • Tissue subsample • DNA extraction, PCR amplification • DNA sequencing • Data submission to GenBank • Using the reference library: • Unidentified specimen • Tissue, DNA, sequencing • Comparison with reference sequences
ND1 ND2 ND3 COIII How Barcoding is Done From specimen to sequence to species Public Databases of Barcode Records Collecting DNA extraction CO1 gene DNA sequencing Trace file Voucher Specimen
BOLD Workbench for Barcode Data Assembly/Analysis NBII, 25 February 2009
GenBank, EMBL, and DDBJOfficial Archival Repositories of Barcode Data http://www.insdc.org/
Current Norm: High throughput Large labs, hundreds of samples per day Large capacity PCR and sequencing reactions ABI 3100 capillary automated sequencer
● US$100-150K purchase ● 2-3 hours processing time ● 150-500 samples per day ● US$3-5 per sample
Technology Development Partnership Goal The DNA Sequencing Lab of 2013?
Producing Barcode Data: 201?Barcode data anywhere, instantly • Data in seconds to minutes • Pennies per sample • Link to reference database • A taxonomic GPS • Usable by non-specialists
Status of Barcode Data • BOLD records (public and private): • 956,000 records, 78,000 named species • BARCODE records in GenBank: • 194,000 records • Insects: 150,000 records • Fish: 23,500 records • Birds: 6,000 records • Mammals: 2500 records
BARCODE Data StandardRequired Elements for COI • Species designation • Voucher ID in standard Darwin Core format • Minimum 500 bp, >1% ambiguous sites • Bidirectional overlapping reads, 2 trace files • Primer name and sequences • Country/ocean region • Strongly recommended: • Collection date and collector • Identifier • Latitude/longitude
Non-COI regions for other taxa • Land plants: • Chloroplast matK and rbcL approved Nov 09 • Non-coding plastid and nuclear regions being explored • Fungi and protists: • CBOL Working Groups convened • Recommendations expected in 2010
BARCODE Records in INSDC Specimen Metadata Voucher Specimen Species Name GeoreferenceHabitatCharacter setsImagesBehaviorOther genes Indices - Catalogue of Life - GBIF/ECAT Nomenclators - Zoo Record - IPNI - NameBank Publication links - New species Barcode Sequence Trace files Primers Other Databases Literature(link to content or citation) PhylogeneticPop’n GeneticsEcological Databases - Provisional sp.
Linkout from GenBank to Taxonomy ISBER: 13 May 2009
Link from GenBank to Museums ISBER: 13 May 2009
Washington Airport Gate 3 • Dulles, National, or Baltimore-Washington? • 2 concourses at BWI concourse A or B? • 3 concourses at National • 4 Dulles concourses
Darwin Core TripletStructured Link to Vouchers Institutional Acronym Collection Code Catalog ID : :
Structured Link to Vouchers : : NHM LEP 123456 : : personal DHJanzen SRNP12345
NCBI’s Biorepository List • Compiled from Index Herbariorum, literature sources, GenBank submissions • 6,936 records • 1,177 records with non-unique acronyms • 517 homonymous acronyms • 374 shared by two records • 143 shared by three records
CBOL/GBIF/NCBI Registry of Biorepositories www.biorepositories.org
Mixture of: • Single collections • Repository institutions • Networks/consortia • Databases • NGOs • Does NOT include: • GenBank • EMBL • DDBJ • BOLD
What Should We Do? CBOL will invest a year to populate institution and collection data in biorepositories.org • Hope to build synchronization with: • Institution database at GenBank • Index Herbariorum • Authority files in BOLD • Hope to install web services • How can we accelerate registration process? • Where should the data reside long-term? • GenBank? • GBIF?