180 likes | 435 Views
Life Sciences Identifiers. Ricardo Pereira TDWG Infrastructure Team (TIP). Data is Available. Many projects exchanging data over the Net. From Botanicus.org. Existing identifiers. Taxon names Catalog numbers Others Institution Code, Collection Codes. Problems to solve.
E N D
Life Sciences Identifiers Ricardo Pereira TDWG Infrastructure Team (TIP)
Data is Available • Many projects exchanging data over the Net
From Botanicus.org Existing identifiers • Taxon names • Catalog numbers • Others • Institution Code, Collection Codes
Problems to solve • Integrating data Specimens Taxon Name Taxon Observations A computer can’t, because it is all loosely linked by the taxon name. With some effort a scientist who comes along can put all that information together. Publications
Data + Terms & Coditions Problems to solve • Giving proper attribution It’s difficult to keep attribution Information with the data Museum A Museum B Reports
Problems to solve • Tracking provenance A scientist can tell where a record came from by looking at the collection code It may be awkward for a computer to do it based just on collection codes
Solution • Globally Unique Identifiers (GUID) • A scheme to identify and access data objects on the Web. • Identifiers are persistent • They are permanently associated with a data object. • Identifiers are globally unique • Identifiers are actionable or locatable • Provides mechanisms to describe objects: metadata TDWG Architecture Globally Unique Identifiers TDWG Ontology ExchangeProtocols
Existing GUID Systems • Life Science Identifiers (LSID) urn:lsid:herbimi.info:specimens:100069 • Handle System 10.1045/january99-bearman • Digital Object Identifiers (DOI) doi:10.1000/182 • Persistent URLs (PURLs) http://purl.oclc.org/OCLC/PURL/FAQ
TDWG picked LSIDs • Existing standard for retrieving data and metadata • Decentralised • Easy to assign large numbers of LSIDs • Conceptually distinct from URLs • LSIDs are names not addresses like URLs • Integrates with TDWG architecture • Returns RDF • LSIDs gets you to the data (resolvable)
Life Science Identifiers (LSID) urn:lsid:zoobank.org:act:20889795-7EC7-42F3-A4C3-D1D97704A609 A taxon name from ZooBank urn:lsid:herbimi.info:specimens:100069 A fungi specimen from Herb. IMI urn:lsid:ubio.org:classificationbank:1164063 The description of a genus from UBio urn:lsid:gdb.org:GenomicSegment:GDB132938 A segment of the human genome from GDB
LSID Resolution Protocol • Well defined way to get data and metadata from an LSID urn:lsid:gdb.org:GenomicSegment:GDB132938 Data Metadata
LSIDs at work • Integrating data Specimens Taxon Concept Taxon LSIDs make links unanbiguous. A computer can Integrate all the Information. Observations Publications
LSIDs at work • Giving proper attribution Every record has an LSID that can take the user back to attribution information Museum A Museum B Metadata
LSIDs at Work • Tracking provenance 2 1 By inspecting the metadata associated with an LSID, a computer can find the original source of a record that has been aggregated
Specimens Observations Images Taxon Concepts Taxon Names What gets an LSID • Data objects that- • You serve to your clients • You are an authority for • You have aggregated • Assign new LSIDs • Keep a link to the source • Examples: • Taxon Names • Taxon Concepts • Observations • Specimens • Images
What we have done • Support: LSID Website, Proxy & Software • Spec: LSID Applicability Statement • Specifies how our community uses LSIDs • TIP Funded Projects • Deployed LSID Resolvers for Taxon Names • Also a few other data types covered • Development of LSID clients
What needs to be done • Ratify TDWG LSID Applicability Statement • Specifies how our community uses LSIDs • Documentation: • LSID Setup Guide • Continue deployment of LSIDs • Increase coverage of other data types: • Specimens, Observations, Organizations, People
What can you do? • Respond to the Request for Comments on the LSID Applicability Statement • Follow instructions at www.tdwg.org • Assign LSIDs to your data objects • Lets users refer unambiguously to them • Gives you credit for them • Lets you express attribution (for derived data) • Asserts legal status of data • Information: lsids.sf.net