1 / 35

GLOBAL BIODIVERSITY

INFORMATION FACILITY. GLOBAL BIODIVERSITY. ECAT Programme Update. David Remsen & Markus Döring. ECAT Goals. GBIF provides a simple and extensible solution for publishing taxonomic checklists Published data used to improve access and data interoperability within the portal

Download Presentation

GLOBAL BIODIVERSITY

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INFORMATIONFACILITY GLOBALBIODIVERSITY ECAT Programme Update David Remsen & Markus Döring

  2. ECAT Goals • GBIF provides a simple and extensible solution for publishing taxonomic checklists • Published data used to improve access and data interoperability within the portal • Published data supports taxonomic name services • Name services support development of tools that meet national and regional needs.

  3. SCOPE of ECAT publishing • Taxonomic Catalogues • Monographs/Flora/Fauna • Annotated Species Checklists • Regional • Thematic • Nomenclators • Name Dictionaries • No taxonomy

  4. Darwin Core Archive Format

  5. Vocabularies.gbif.org • Community-driven • Internationalised • Vocabularies • Extensions • Tested • Ready for release See Spanish Page

  6. Extensions • Extend the DwC • For Occurrence-level • For Species-level • Draft • Add relevant vocabs. • Review • Publish!

  7. Terms of Bionomenclature • Taxonomic Std Reference • Print Publication • Online Reference • Semantic • Supports vocabulary building • April Go to website

  8. PublishingChecklists to GBIF • Integrated Publishing Toolkit (next version) • Full & “lite” • Direct DWC Output from Sources • HIT Adapters for existing sources • Spreadsheets • Desktop Applications • Refactoring existing online Tools (ITIS, EDIT)

  9. HIT Adapters

  10. HIT Adapters View the Project Wiki page with links to all source Scripts See Example DWC Archive Output

  11. Publishing by Spreadsheet • Simple • Validated • Developing countries • Conforms to existing workflow

  12. Publishing by Spreadsheet • Forms and auto-complete • Metadata and data • Occurrence data • Species Checklists • Embedded vocabularies

  13. Desktop Application • Desktop Application • Publishes DwCA • Currently used GBIFS • ~100 sources • 600,000 records • 90 languages • Could be deployed

  14. DwCA Validating Tool View the DwCAValidator

  15. Published DWC Archive files • Current Status • Manually Curated • 82 ECAT sources • 14Taxonomic authority files • 64 Vernacular Name Lists • 2 Nomenclatural Lists • 2 Thematic Lists • 5,800 occurrence classifications • 15M different usages • 11,454,896 unique names assigned to 4.8M name groups • 4,612,444 canonical names

  16. Importing Data

  17. ChecklistBank Command Line Tool • Bundles many tasks into 1 executable jar • adding/deleting/exporting resources, (pre)importing, lexical grouping, nub build • * to be used by HIT module • * importing in 3 steps: •   1) preimport terms •   2) import into isolated db schema •   3) accepting import into public schema

  18. Checklist Data Qualities • Highly relational taxonomic data, almost all records linked in a tree hierarchy + basionym • Wrong or missing records destroy dataset integrity, not just a single record! • Different to flat, unrelated occurrence records Syntactically damaged sources wrong mappings wrong character encodings end of line breaks or tabs within data Data Quality broken referential integrity bad names (e.g. «Unallocated Family») missing or unused controlled vcabularies, e.g. «art» for rank species Names can be published in several ways ScientificName ScientificName + Authorship Genus + Authorship Genus + SpeciesEpitheton (+ Rank + InfraspecificEpitheton)+ Authorship Classifications can be published in several ways Normalised via parentNameUsageID Normalised via parentNameUsage Denormalised via Kingdom,Phylum,Class,Order,Family,Genus

  19. Checklist Bank Model • Lexical Group • Gerardia paupercula var. borealis (Pennell) Deam • Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam • Gerardia paupercula (A.Gray) Britton var. borealis (Pennell) Deam • Gerardia paupercula borealis • Gerardia paupercula borealis (Pennell) Deam • Nomenclatural Group • Gerardia paupercula var. borealis (Pennell) Deam • Agalinis paupercula var. borealis Pennell

  20. Taxonomic Backbone (Nub) What it is How it is built

  21. Composite Taxonomic Backbone • Largest integrated taxonomy in the world • 200 million occurrences • One taxonomic hierarchy

  22. Nub Relevance • Nub Management Classification is used for • provide hierarchy of names • crosswalking between taxonomies • All biodiversity data is aligned via names • Considerable variation in higher taxa • => Maps & Statistics • External linkages, e.g. EOL maps • More details: http://livelink.gbif.org/gbif/livelink/overview/3233870 • Cronquist classification • Mimosaceae: 3,200 species • Caesalpiniaceae: 2,000 species • Fabaceae: 14,000 species • “Modern” classification • Fabaceae: 19,200 species • Mimosoideae: 3,200 species • Cæsalpinioideae: 2,000 species • Faboideae: 14,000 species

  23. Nub Components

  24. Nub Building • Regular Checklist Resource • Lexical Grouping • Canonical homonyms • Authorship matching difficult • => canonical names + kingdom • Ignore noisy occurrence derived only names? • Nub Assembling • 8 CoL kingdoms • Each LexGroup becomes a nub usage • Contradicting classifications • Intermediate rank synonyms • Select preferred, wellformed name • Stable IDs • Rated sources, nomenclatural resources for names, taxonomic for classification • Subphylum in ANIMALIA • Vertebrata • Vertebrate • Vertebrata Cuvier, 1812 • Algae genus in PLANTAE • Vertebrata • Vertebrata Gray • Vertebrata S.F. Gray, 1821

  25. Nub Building

  26. Admin Console View the Admin Console

  27. Discovery: Portal and Services

  28. Checklist Bank Portal • 82 ECAT Resources • 14 Taxonomic Catalogues • 64 Vernacular Name lists • 2 Thematic Lists • 2Nomenclators Go to Portal

  29. Checklist Bank Web Services • Checklist Service • Name Usage Resolver • Name Usage Service • Name Usage Navigation Service • Name String Service • Image Service Go to API Page

  30. Name Parser • Uses • Comparing • Matching • GBIF Backbone • “Did you mean” Try GBIF Name Parser

  31. Name Recognition Services • Updated Service • March 2010 • DWC API • Uses • IAIA parsing • Adding names to metadata • Checklists from documents View GBIF Name Recognition Tools

  32. “TaxonTagger” tools View TaxonTagger Sample document (Butterfly list)

  33. Using Name Services: Data Entry Google Docs: Live Example

  34. Taxonomic Indexing • Mining names from publishing RSS feeds • IAIA reports • KNB Knowledge network • Mapping to Species lists • “Any red-listed species in this set of IAIA reports.” • Name Parser API • TaxonFinder API • Checklist Bank API

  35. Other 2010/11 • Mapping Services • Linking a data collection to a specific taxonomic authority • Taxonomic Validation and Annotation of Occurrence data. • Linking to Community Species Pages

More Related