170 likes | 195 Views
GLOBAL BIODIVERSITY. INFORMATION FACILITY. Designing a Global Network to Accommodate Contributions from all Sources and Technical Abilities. Tim Robertson GBIF Secretariat. Content. How the GBIF index is built Joining the GBIF network Technical requirements
E N D
GLOBALBIODIVERSITY INFORMATIONFACILITY Designing a Global Network to Accommodate Contributions from all Sources and Technical Abilities Tim Robertson GBIF Secretariat
Content • How the GBIF index is built • Joining the GBIF network • Technical requirements • Documentation on services and standards • The use of current protocols for data harvesting • Simplified full dataset harvesting • The new GBIF integrated publishing toolkit • Extending the model – Simple Transfer Schema task group
Basis of Record: Data served (Source: GBIF Data Portal October 2008)
Comparison: International Standards Organisation • International Standards Organisation • 2 digit country codes (ISO 3166) • Multilingual (English, French + external translations) • Simple Tab Demitted File format • Loads straight into database for reuse • As simple as it needs to be… For controlled vocabularies, could this approach be adopted? Could removing complex technical schemas allow for easier contribution?
Harvesting: Using existing protocols • Provider has TAPIR wrapper • Wrapper allows for 200 records per request • 260,000 records to harvest • 1300 request / responses • 9 hours total • 500MB XML transferred • Extracted to a 32MB delimited file for the index • Compressed to 3MB • Why not produce this on the provider?
Harvesting: Streamlining the process • Benefits • Indexes can be more up-to-date • better for the user • benefits provider • Provider systems can be left to answer specific real queries • the original purpose for the wrapper software • Easy for small data publishers to produce • Already done in an ad-hoc manner for very large providers • Not dissimilar to Sitemaps protocol
Harvesting: Streamlining the process If this is already being done in an ad-hoc manner, should it be defined as a standard?
GBIF: The integrated publishing toolkit (IPT) • Publishing of • Occurrence data • Checklist data • Taxonomic data • Dataset descriptive data (metadata) • Key features • Embedded data cache • takes load off ”LIVE” system • allows for file based importing • Web application to search and browse data • TAPIR, WFS, WMS, TCS, EML, RSS, ”Local DwC Index” • Simple extensions – the ”star schema” • Can be used in a hosting environment
GBIF: The integrated publishing toolkit (IPT) • Ready for ”alpha” testing – please enquire! • Demonstrations by Markus Döring and Tim Robertson all week • Poster • Lunchtime session Tuesday
Extending the model: More data types • The data being mobilised is largely “single core entity” • the “Occurrence Record” • Integrating with other areas? • Earth observation networks • Ecological networks • Task group to investigate specific use cases to determine a Common Transfer Schema: • Primarily data modeling experience • Technical implementation • Presentation to TDWG community • Perhaps multiple core entities, each extensible?
Contact Tim Robertson GBIF Secretariat Universitetsparken 15 2100 Copenhagen Denmark trobertson@gbif.org