230 likes | 294 Views
WWW.GBIF.ORG. GLOBAL BIODIVERSITY. INFORMATION FACILITY. The GBIF Data Repository Tool (New updated version 3.0). Hannu Saarenmaa EC CHM & GBIF European Regional Nodes Meeting Copenhagen, 2005-09-15/18. Outline. Objectives and background Design and installation Use Demonstration.
E N D
WWW.GBIF.ORG GLOBALBIODIVERSITY INFORMATIONFACILITY The GBIF Data Repository Tool (New updated version 3.0) Hannu Saarenmaa EC CHM & GBIF European Regional Nodes Meeting Copenhagen, 2005-09-15/18
Outline • Objectives and background • Design and installation • Use • Demonstration
Challenges in data sharing • Eventually, all data sets become orphans: Archiving services are a necessity. • The concept ”share once, use many” requires available data repositories. • Data from archives must be available using standard mechanisms to portals such as GBIF. • IPR, confidentiality, and benefit sharing must be respected at all times.
Goals of the GBIF Data Repository Tool • Enable data custodians to manage their data and control its publishing. • Provide mechanism such that spreadsheets, etc., can directly be used for sharing data • Hide the database complexities from users • Make available a simple data warehouse tool for those who want to host datasets for the community • I.e., lower the threshold of data sharing as low as possible.
Functionalities • Data must be formatted according to the Darwin Core standard and its extensions in flat spreadsheet format. • In fact, any flat format will work (rows, columns) • The system will check and parse the data into embedded MySQL database that becomes available to the public as a DiGIR/TAPIR resource. • Owners can control the level of detail released: • Fuzzying of geographic coordinates is available • Collector names and time periods can be hidden • Approval of terms and conditions for data use can be required • Owner can revoke release and update data. • Metadata can be inherited to data to replace missing values as defined. • Includes an embedded image server
Installation • For Linux and Windows • Based on Python, Zope 2.10 and MySQL • Supports the DiGIR and TAPIR protocols of TDWG • Turn-key installation • Fits with directly into the EC CHM software package
Steps for data owners • Prepare the data files • Create a nested folder structure on the Repository for the collection • Enter default metadata scope (to cover missing values in data, etc.) • Decide on access policies • Upload the files • Publish the data files
Create the resources (databases) of the collection and folders
Access policy • options: • Fully open • Standard GBIF policy of acknow-ledgements • No direct download and fuzzying for web service access
Data is now searchable locally and through the DiGIR/TAPIR protocols