1 / 24

Globally Unique Identifiers and Life Science Identifiers

Globally Unique Identifiers and Life Science Identifiers. Dave Thau thau@learningsite.com University of Kansas California Academy of Sciences www.learningsite.com. Outline. Describe Global Unique Identifiers Show how they’re relevant Describe one GUID system (LSIDs)

nellis
Download Presentation

Globally Unique Identifiers and Life Science Identifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Globally Unique IdentifiersandLife Science Identifiers Dave Thau thau@learningsite.com University of Kansas California Academy of Sciences www.learningsite.com

  2. Outline • Describe Global Unique Identifiers • Show how they’re relevant • Describe one GUID system (LSIDs) • Outline some issues around using GUIDs for TDWG-related activities • Provide some resources • Open discussion

  3. GUID Is Not An Ugly Word It ’s guid to be merry and wise, It ’s guid to be honest and true, Robert Burns Here’s a Health to Them that ’s Awa’. Pteroptochos tarnii AKA Guidguid Image From: animaldiversity.ummz.umich.edu

  4. GUID: Globally Unique Identifier • A short name for a complex entity • Useful for locating information about the entity • Each name identifies only one entity • There is some sense of permanence

  5. Some things which fit this description • GenBank accession numbers: AP006480.1 • US Patent numbers: 5443036 (laser guided cat exercise) • Digital Object Identifier: 10.121/3212

  6. In Our Domain SDD Document – Representing some data set. <ClassName id="1"> <Label> <Representation language="en"> <Text>Cypselurus heterurus (Rafinesque, 1810)</Text> </Representation> </Label> <Link> <LSID>lsid.gbif.net:www.fishbase.org:1029</LSID> </Link> <Rank>sp</Rank> </ClassName> Napier Schema Document – Representing some taxon. <TaxonConcept id=“urn:lsid:bioguid.org:seek:121212“ type="original"> <Name type="scientific"> <NameSimple>Canis lupus</NameSimple> </Name> … <Relationships> <Relationship type=“is child of"> <ToTaxonConcept ref=“urn:lsid:bioguid.org:seek:5743" /> </Relationship> </Relationships> </TaxonConcept>

  7. Features of a GUID system • Global uniqueness scoped to Internet • Should be easily resolvable by a computer or human • Should identify things down to whatever level of granularity necessary • Should not be limited to proprietary systems • Should serve up all sorts of data • Database records • Text files • Images • It would be nice if the identifier had associated metadata

  8. Life Science Identifiers • Official standard of the Object Management Group (OMG) • Support for metadata and authentication • Supports multiple protocols (e.g. HTTP, SOAP) • Can serve up data in any format • Decentralized – anyone can issue an LSID • LSID code available in Java and Perl. • A young standard, but increasingly used.

  9. Organizations Using LSIDs • National Center for Biotech Information (NCBI) • Pubmed • Genbank • European Bioinformatics Institute (EBI) • US Long Term Ecological Research Network (LTER) • BioMOBY – an biological database interoperability program (biomoby.org) • Open Bioinformatics Foundation (open-bio.org) • myGrid– a BioGRID project (mygrid.org.uk)

  10. A Small Pause For More Squid Humor

  11. LSID Format urn:lsid:bioguid.org:seek:117866:v1 • urn – indicates that this is a URN • lsid – indicates that it’s an LSID-type urn • bioguid.org – the authority who issued the LSID • Doesn’t have to be a domain name – but for now probably should be. • bioguid.org does not necessarily have the data or metadata. • There may not even be a machine called bioguid.org. • seek – a name space id internal to that authority • The name space is meaningless to systems outside that authority. • 117866 – the local identifier within that authority • Also internal to the authority • v1 – an optional version number • If no version, no trailing colon either.

  12. Data and Metadata • An LSID has data • Examples • The gene sequence in GenBank • The actual LTER data set, maybe in excel, or in a text file • The data should never change • An LSID also has metadata • Example metadata • The format of the data • A display title for clients displaying the LSID • Dublin core metadata • Anything you want • The metadata can change

  13. Example LSIDs • An LTER fish abundance data set • urn:lsid:limnology.wisc.edu:dataset:ntlfi02 • A PubMed reference: • urn:lsid:ncbi.nlm.nih.gov.lsid.biopathways.org:pubmed:12441808 • A GenBank sequence: • urn:lsid:ncbi.nlm.nih.gov.lsid.biopathways.org:genbank_gi:30350027

  14. How LSIDs work LSID Client Maybe Launchpad Maybe Haystack Maybe BioFerret Maybe myGRID Maybe Yours! DNS Find DNS record Resolve it to get Address of Authority • Find the authority for this LSID Returns the LSID Authority Server 2. Query authority for available services LSID Authority Returns WSDL for this LSID 3. Chose a service, get the goods Data Store Metadata Store HTTP, SOAP, FTP, others

  15. LSID Promises • I promise to never change the data behind an LSID. • I will make sure my LSIDs are being served, or give them to someone who can do it. • I will give my LSIDs metadata – at least give them a title and a format

  16. Other GUID systems • URLs • Files move • The data change • Unstructured metadata • UUIDs – 128 bit string, guaranteed unique • 58f202ac-22cf-11d1-b12d-002035b29092 • No resolution • No metadata • Handle System / DOIs (10.12/2312) • Non standard protocol • Centralized resolution • Unstructured metadata (for Handle System) • High costs (for DOI)

  17. Issues For This Community • What gets a GUID? • For each of those things, what’s the data, what’s the metadata? • One GUID per item? • Centralization – who issues GUIDs?

  18. What Gets a GUID? • These things probably should get GUIDs • Taxonomic concepts • Specimens • Publications • People • These things might get GUIDs • Taxonomic names • Journals • Data providers • Observations

  19. Specimen Data? Metadata? • If specimens get a GUID – what does it identify? • The physical specimen? • A collection’s database record of the specimen? • What about multiple labels? • Main question – what doesn’t change about a specimen? • Other main question – how should the data be represented? • Darwin core includes current institution location. Not a good idea for the data of a GUID since that may change.

  20. One GUID Per Item? • No GUID system inherently enforces a 1:1 mapping between GUID and data. • Everyone should TRY to limit the number of GUIDs per item. • Should there be any centralization to help achieve this?

  21. Degrees of Centralization • An index • List your GUID authority in an index so your GUIDs are easy to find. • A central authority • One authority could be responsible for issuing GUIDs to the community for specific types of information – you’d have to get one from here. • GBIF? • The IC_Ns? (ICZN, ICBN….) • lsidauthority.org? • This would help enforce a 1:1 mapping of GUIDs and data items • It would also alleviate data providers from the need to maintain their own authorities • It MAY also reduce the likelihood of GUIDs becoming unresolvable • It may also be infeasible technically, or socially. • A respected authority • With LSIDs, an authority can be set up to serve its own GUIDs and proxy other authorities. • This would help enforce a 1:1 mapping for those who use the authority • It may also be more feasible.

  22. LSID Resources • LSID Articles and code from IBM • http://www-124.ibm.com/developerworks/oss/lsid/#whatislsid • Current LSID specification • http://www.omg.org/cgi-bin/doc?dtc/04-05-01 • Launchpad – An LSID resolver for Windows IE • available from first link • A website which resolves LSIDs • http://lsid.biopathways.org/resolver/ • URN specification • http://www.ietf.org/rfc/rfc2141.txt

  23. Acknowledgements • My work on GUIDs has been funded by the SEEK project – seek.ecoinformatics.org. • SEEK is funded by National Science Foundation award 0225676. • Thanks to Ben Szekely at IBM for his LSID articles, his LSID java code, and for answering all my questions.

  24. Questions for Discussion • Do we need GUIDs? • What gets a GUID? • One GUID per item? • Centralization?

More Related