1 / 18

SERNEC Image/Metadata Database Goals and Components

SERNEC Image/Metadata Database Goals and Components. Steve Baskauf 2009-11-04. Overall goals.

grady
Download Presentation

SERNEC Image/Metadata Database Goals and Components

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SERNEC Image/Metadata Database Goals and Components Steve Baskauf 2009-11-04

  2. Overall goals • To create a metadata database structure that is flexible and can handle specimen data, specimen images, and live plant images. The database will be designed to easily output to consumers including Morphbank, GBIF, and a SERNEC web portal. • To create contributor interface(s) that will allow rapid data entry or transfer with minimal contributor effort.

  3. Conceptual scheme: players Contributors without institutional infrastructure SERNEC Web portal SERNEC database Morphbank Institutional database Conversion utility GBIF contributors consumers

  4. General Principles • SERNEC acts as a facilitator. • Participation in the SERNEC database doesn’t prevent contributors from doing anything that they were already doing • SERNEC doesn’t “own” anything • SERNEC sets minimum standards for participation that will allow the system to operate and that will ensure the quality of the metadata served • Components in the system are “black boxes” that don’t require participants to understand other parts • Interactions among components are governed by generally recognized standards for communication: XML, LSIDs or LSID-based HTTP URIs, Darwin Core, MRTG • System should not collapse if any component disappears.

  5. Facts About Persistent Identifiers • Persistent identifiers (universally unique identifiers=UUIDs=GIUDs) are coming. • In a complex system, unique identifiers are needed to determine whether a resource exists already (to prevent creation of duplicate records) • Use comes with responsibilities: • Must guarantee uniqueness • Persistence • Should be actionable (provide metadata to users)

  6. LSIDs (or HTTP URI) assignment • urn:lsid:<authority>:<namespace>:<objectID> or http://authority.org/urn:lsid:<authority>:<namespace>:<objectID> • It appears likely that resolution service will be provided centrally by a big player like GBIF, i.e. they will be the authority: gbif.org . • Individual users will be responsible for making sure that their resources have unique string identifiers. • SERNEC is probably going to have to be the party ensuring that the namespace is unique (by negotiation with the authority) • Some users may generate their own persistent identifiers and that will have to be fine with SERNEC.

  7. Strategy for Generating Internal Unique IDs • Each participating institution MUST have unique IDs within each of their collections (this is the <objectID>) • SERNEC keeps a list of institution codes checked with biocol.org for uniqueness. • If unique IDs within institution, <namespace> is institutioncode • If unique IDs within collection but not institution, <namespace> is institutioncode_collectioncode • Internal Unique ID = <namespace>:<objectID> • When an authority is willing to handle our GUIDs, we check to make sure that each SERNEC namespace is unique within their authority, then concatenate internal unique ID to authority part of LSID.

  8. System component: the database SERNEC database • Structure needs to be able to handle both specimen and live plant images • Must keep track of the status of resources • Are they new with non-redundant IDs? • Have they been updated? • Has the data/metadata been passed on to the consumers? • Should be simple enough or exportable enough to outlive SERNEC if necessary

  9. Individual Individual Herbarium specimen Live plant image Live plant image Live plant image Specimen image Specimen image • Relevant occurrence types are specimens & images • Record fields governed by: • Darwin Core (general specimen & live-plant image metadata ) • MRTG (image-specific specimen & live-plant metadata) • Individuals may be represented by a composite of the relationship types shown if the plant is both imaged directly and collected.

  10. taxon 1 (T1) Individual (I) determination 1 (D1) • Determination structure compatible with annotations • Determination structure compatible with taxonomic concept mapping (multiple possible names) • Determination structure capable of tracking resources used to make determination • Determinations linked to standardized taxon units (ITIS TSNs and/or LSIDS taxon 2 (T2) determination 2 (D2) resource resource resource

  11. SERNEC database /consumer relationships SERNEC Web portal SERNEC database • SERNEC web portal: regional data, end-user educational resources, facilitation of collaboration • Morphbank: permanent image repository, provider to downstream secondary consumers (i.e. EOL) • GBIF: primary biodiversity database, possible future resolution service for persistent identifiers Morphbank GBIF consumers

  12. SERNEC database/web portal • Support Flora of the Southeast or successor web documentation efforts • Provide user-friendly mechanisms for searching for data and images, organize “courtesy requests” for non-commercial use of large numbers of images • Provide access to data-driven educational/research applications, e.g. visual keys, iPhone data apps, teacher lesson plans

  13. SERNEC database/Morphbank • Capable of generating XML needed by Morphbank for image submission. • Query Morphbank services to determine whether contributor has already uploaded the image to Morphbank • Update Morphbank image records if contributor changes metadata.

  14. SERNEC database/GBIF • Provide primary biodiversity records to GBIF using IPT/TAPIR protocol for institutions not capable of maintaining their own services. • Assuming at some point in the future GBIF or another organization provides resolution services for organizations not capable of acting as LSID authorities, data from the SERNEC database would be passed to the resolution provider to be used for LSID resolution.

  15. SERNEC database/provider relationships Contributors without institutional infrastructure SERNEC database • Contributors without institutional infrastructure: SERNEC-created web-based tools would allow users having limited record-keeping capabilities and IT infrastructure to submit metadata and images • Contributors with institutional infrastructure: SERNEC would create customized conversion utilities that would accept database output of various formats and convert them to a form that can be recognized by the SERNEC database Institutional database Conversion utility

  16. SERNEC/Contributors without IT infrastructure • Users would be responsible for: • Collecting and organizing their own metadata using software (e.g. Specify or Excel) capable of simple text (CSV or tab delineated) or Excel output. • Maintaining identifiers (strings) that are unique within their institution. • SERNEC-provided software would generate LSIDs and convert metadata to fit SERNEC database data model as well as facilitating the association of images with metadata • It is assumed that contributors will have little or no interaction with consumers (GBIF, Morphbank) outside of that facilitated by SERNEC

  17. SERNEC/contributors with IT infrastructure • Contributors may have their own system for: • maintaining a complex database for metadata • generating LSIDs and either maintaining their own authority or transmitting metadata directly to another institution acting as the authority (e.g. GBIF) • managing specimen and live-plant images and associating them with the appropriate metadata in their database • Conversion utility enables the SERNEC database to “talk” to contributor’s system and update SERNEC database

  18. Main points • All the necessary components (standards, contributors, consumer organizations) exist or will exist within the next year. • SERNEC has established relationships with all of the required players. • Players are willing to participate and have a vested interest in seeing it succeed. • SERNEC has the human, financial, and IT resources to pull this off. • Participants take care of themselves to the maximum extent possible, SERNEC “helps” smaller institutions to participate on same level as bigger players.

More Related