1 / 20

Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS

Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS. Jessie Kennedy Rob Gales, Robert Kukla. Introduction. Data sharing is fundamental to biodiversity and taxonomic data applications, Previous attempts to facilitate sharing have had limited success

jbodie
Download Presentation

Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla

  2. Introduction • Data sharing is fundamental to biodiversity and taxonomic data applications, • Previous attempts to facilitate sharing have had limited success • lack of take up of data exchange standards • now slowly happening due to the TDWG standards initiative • the absence of a common terminology or vocabulary for use in taxonomic data • the lack of reference database systems for serving authoritative data • Proposed new technologies • a Core Ontology for taxonomic data to model the biodiversity domain. • Adoption of Life Science Identifiers (LSIDs) by the TDWG GUID group • for uniquely identifying taxonomic data objects, e.g specimens, names, concepts, etc. • LSIDs can make use of an Ontology to define the data to be returned • Need a mechanism for migrating existing data to the new technologies • explore the issues in using LSIDs and RDF according to an Ontology.

  3. Re-using LSIDs • Using LSIDs per se will not address the issue of data sharing • Repositories must reuse LSIDs to cross reference data within and outwith their own repository. • It is important that we use the same LSID to refer to the same entity • If multiple LSIDs exist for the same entity we would be required to decide whether or not two LSIDs were really the same thing. • We would be in a similar situation as we are today, • for example, trying to decide if two taxonomic names are really the same. • Generating LSIDs for any self contained data set is a fairly trivial task • Appointing LSIDs to existing data from an authoritative repository to re-use them is more challenging.

  4. Project Overview • Imagining the future • Assume have authority providers for certain data • Publications, names etc e.g. IPNI, ZOObank, IF, Pubbank… • Want to Convert Existing Data repository • Relational database • the Hexacorallians of the World • Represent existing data as RDF triples • Use LSIDs to uniquely identify entities in data • according to a domain ontology which extends TDGW core ontology • Use LSIDs to cross reference between the data in the repository • Some LSIDs re-used from external sources • Some LSIDs generated locally • Owned data • Development of a tool to aid the process of converting internal database keys to LSIDs • aid users in appointing the appropriate LSID from some external LSID authority.

  5. Creating Domain Ontology • Draft Core Ontology • Core and BDI ontology • Classes and optional relationships between classes • Extend to Domain Ontology • Domain classes inherit from the core classes • Extended with additional classes • Re-use existing ontologies where possible • Specify additional literal properties • Where necessary • Straightforward for developer • For Hexacorallia data • Creating RDF triples • Manual mapping of relational data to RDF triples according to OWL specification • Used wasabi mapping extensions & custom code for generation

  6. Simulated Authority Name Data providers Specimen Concept Triple Person Publication Triple Triple Store e . g . IPNI / Zoobank , Pubbank , Store Triple Triple Store Museum _ specimens Store Store Map Map Map Map Map + AutoLSID + AutoLSID + AutoLSID + AutoLSID + AutoLSID Test Data set Hexacorallian Generate LSID and RDF instances according to classes in the ontology appropriate to each “authority” Database Simulate Authority Providers

  7. Linker Tool Convert Existing Provider Convert Existing Thematic Data Provider to use existing LSIDs and ontology Original data repository RDF Data to be updated with LSIDs from “authority” providers Hexacorallia Thematic Provider Map to ontology Hexacorallia Thematic LSID Observation subset Triple Store LSID Match with linking tool Match + ->LSID Match + ->LSID Match + ->LSID Match + ->LSID Match + ->LSID Store Person Authority ( simulated ) Publication Name Specimen Concept Triple Observation Triple Triple Triple Triple LSID Resolution Store Triple Store Store Store Store Services

  8. Hexacorallia Thematic Triple Store Person Triple Store WASABI Service Request Dispatcher OAI LSID SPARQL local (“target”) provider Linker Client WASABI Service Request Dispatcher OAI SPARQL LSID Linker authoritative (“source”) provider & linker Linking….

  9. Select class to be linked Name the local repository Configure Provider for Update

  10. Hexacorallia Thematic Triple Store Person Triple Store WASABI Service Request Dispatcher OAI LSID SPARQL local (“target”) provider Linker Client WASABI Service Request Dispatcher OAI SPARQL LSID Linker authoritative (“source”) provider & linker Linking….

  11. Name authority provider with linking service Select class to link on Configure the linker

  12. Hexacorallia Thematic Triple Store Person Triple Store WASABI Service Request Dispatcher OAI LSID SPARQL local (“target”) provider Linker Client WASABI Service Request Dispatcher OAI SPARQL LSID Linker authoritative (“source”) provider & linker Linking….

  13. Request Annotations

  14. Communication between linking service and linking client Linking Service…

  15. Linking Service Determines properties for matching Return suggestions to the client Weight possible matches

  16. Person to find LSID for Confirm/Skip Annotations Suggested match

  17. Choice of possible persons with LSIDs Person to find LSID for Confirm/Skip Annotations

  18. Research Questions • How effective is the draft ontology for representing existing data sources? • Can suitable extensions be easily defined? • Straight forward for developer • Need independent verification… • What are the issues for an existing data provider to convert their data to using the ontology and LSIDs? • Replace or annotate existing data • If, for example, I replace an author with a person LSID what I get when I resolve a person won’t likely be what I would have had when I had the data for an author. • Dependencies between LSID’able objects • If you link via a taxon name LSID – the resolved name should have embedded an LSID for a publication – so there shouldn’t be any need (in principal) to match publications for names • What about authorities that issues LSIDs but don’t map to other authorities • e.g. name providers not mapping to either publication or specimen providers • and don’t want to!

  19. Research Questions… • What support would a linking tool need to provide end users? • How would users want to process this data • How much automation? • E.g. above a certain confidence level • Would his be trusted? • Order of matching • E.g. match all instances of persons at once • Match of persons by publication? • Other Issues… • Performance of existing linking tool approach • Lots of data passing going on • Need better batch or one at a time • Finding authorities that provide linking services • How do you find out about authorities with linking services? • How do you know which ones to use?

  20. Acknowledgements • TDWG/Gordon Betty Moore Foundation

More Related