1 / 18

LSIDs and RDF in TDWG

LSIDs and RDF in TDWG. Roger Hyam, TDWG, RBGE Donald Hobern, GBIF June 7-9, 2006 - Edinburgh, UK. Paradigm. Starting assumption is that standards are about sharing data. Sharing data also implies sharing data through time. Archive. What is Shared?. Sharing raw literals isn’t much use.

Download Presentation

LSIDs and RDF in TDWG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LSIDs and RDF in TDWG Roger Hyam, TDWG, RBGE Donald Hobern, GBIF June 7-9, 2006 - Edinburgh, UK

  2. Paradigm • Starting assumption is that standards are about sharing data. • Sharing data also implies sharing data through time. Archive

  3. What is Shared? • Sharing raw literals isn’t much use. • They need to be gathered together into ‘semantic’ units or objects. perennis 1234 Bellis TaxonName:1234 Bellis perennis

  4. Semantics of Objects • Objects need to be based on some shared semantics. • There needs to be somewhere to look up what they mean – an ontology. Ontology TaxonName: Bellis perennis TaxonName?

  5. Identity of Objects • How do I refer to this object? • Who should I credit? • Who should I send corrections to? • Is it the same record as I already have or is it a new one? • What is the official version of this data - has some one altered it before I received it?

  6. TDWG TAG-1 Meeting • There was consensus on- • Architecture is concerned with shared data • Biodiversity data will be modeled as a graph of identifiable objects • The semantics of these objects will be encoded in a series of shared ontologies • Ontologies will be related to each other on the basis of a shared Base and Core ontologies as a minimum • Discussion continues on how this is done

  7. Implications • We need a ontology to define and relate the objects we exchange. • Ontology governance/management is paramount. • We need a system of GUIDs to identify the objects. • We need a roadmap for the protocols to exchange these objects.

  8. Structure of the Ontology BaseOntology BaseThing BaseActor Core Ontology CoreTaxonName CoreInstitution Domain Ontology TaxonName Herbarium NomencalturalType NomeclaturalNote Application Ontologies ABCD DarwinCore ???

  9. Ontology Governance • Allow people to create Domain sub-ontologies easily – prevent alienation. • Each ontology construct (concept) has a status. • Status is increased by passing through explicit gates defined by actual usage. Experimental Shared Recommend

  10. What about RDF? • The need to share identifiable objects has been established without reference to a technology. • We are interested in objects not triples. • Typical use case involves a client consuming semantically heterogeneous data from multiple sources. • Semantic Web technologies would be ideal – but aren’t part of the TDWG culture and there are ‘unbelievers’.

  11. Current ‘Standards’ • DarwinCore & DiGIR • Based on Z39.50 • HTTP based XML message / response • Simple ‘flat’ application schemas (RDF-like) • ABCD & BioCASe • Based on DarwinCore & DiGIR • Complex document structure. • TAPIR • Unification of BioCASe and DiGIR • No RDF, Objects or GUIDs here yet!

  12. Combing Data • GBIF data portal is the only ‘application’ that does data integration between these formats. • No standard way to include XML fragments from other XSD other than xs:any. • There is overlap between the different schemas and no easy way to merge them.

  13. What about LSIDs • GUID-1 meeting considered several GUID technologies including (LSID, DOI & Handle). • Life Science Identifiers are being assessed. • I3C & OMG URNs • urn:lsid:ncbi.nlm.nih.gov:pubmed:12571434 • getData() • getMetadata()

  14. LSID Permanence • LSIDs should not be recycled – i.e. Used for more that one object. • LSIDs should always resolve but it is OK for them to resolve to a 404 (Gone) error. • No central authority to control these things. • Even DOIs go away if there isn’t institutional backing!

  15. LSIDs for Everything? • Are there some things for which LSIDs are inappropriate? • <logo rdf:resource=“urn:lsid:example.com:branding:logo.gif” /> • xsi:schemaLocation=“urn:lsid:example.com:xsd:taxon.xsd” • xmlns:tn=“urn:lsid:example.com:ontology:taxon/” • Definitely places where we will use something else. • Other people will use their own identifiers e.g. DOI, Handle etc.

  16. So what’s cooking? Recognised Need For GUIDS Different GUID Technologies XSD Based Conceptual Schemas A TDWG Ontology XML Based Exchange Protocols Emergent Semantic Web OGC Standards (GML) Other! 200+ Data Providers 50+ Million Anonymous ‘Records’ BioMOBY Clients?

  17. Possible Roadmap • Build the ontology as a focus for semantics. • Resolution and Harvest protocols should be relatively easy to plug into or wrap round existing service providers so approach these first. • Search/Query – More problematic BioCASe, DiGIR, TAPIR, SPARQL, other?

  18. Thank You • Gordon and Betty Moore Foundation • Global Biodiversity Information Facility • NESC • TDWG Members

More Related