200 likes | 308 Views
Globally Unique Identifiers in Biodiversity Informatics. Kevin Richards Landcare Research NZ TDWG 2008. Introduction. GUID ( G lobally U nique ID entifier) What, Why, Which, How LSIDs Issues. What are GUIDs. G lobally U nique ID entifier
E N D
Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008
Introduction GUID (Globally Unique IDentifier) • What, Why, Which, How • LSIDs • Issues
What are GUIDs Globally Unique IDentifier • A short name for a complex entity on the web • Each name identifies only one entity • Examples: • UUID eg3E9D6B68-A08C-4F15-BC8A-1265F15D30E2 • DOI egdoi:10.1006/jmbi.1998.2354 • Handle eghdl:123.456/abc • LSID eg urn:lsid:indexfungorum.org:names:213645 • PURLeg http://purl.oclc.org/abc/123
What is a GUID • Properties • Persistent • Opaque • Resolvable, sometimes - useful for locating information about the entity
Why use GUIDs Data at Provider 2 BOOK : “Three little pigs” 2 copies Data at Provider 1 BOOK : “The three little pigs” 3 copies Data Consumer BOOKS: “Three little pigs” … (2) “The three little pigs” … (3)
… but with GUIDs … Data at Provider 2 (ID = P2) BOOK : “Three little pigs” ID (eg ISBN) = A123 2 copies Data at Provider 1 (ID = P1) BOOK : “The three little pigs” ID (eg ISBN) = A123 3 copies Data Consumer BOOKS: ID : A123 : “The three little pigs”… (5) BOOK Titles: ID A123 : Provider P1 : “The three little pigs” ID A123 : Provider P2 : “Three little pigs”
Example in our domain Consensus Id : urn:lsid:compositae.org:names:45240C9B-D419-4B6F-93A5-D0A6DEAB4C81 Name : Anthemis gaudium-solis Velen.
Which GUID • GUID Subgroup Recommendations: • Use LSIDs for identifying biodiversity data • Reuse GUIDs where they already exist • GUID type • Existing assignments • See GUID Report - http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report&show_comments=1 Also Canberra LSID Workshop report:http://www.tdwg.org/fileadmin/subgroups/guid/LSID_policy_workshop_Report_Canberra.pdf
What is an LSID? • Life Science IDentifier • Developed by The Object Management Group & W3C • Implemented by the team at IBM • Used for – data objects, datasets, images, files
LSID Formaturn:lsid:bioguid.org:taxon:1122:v1 • Prefix - indicates that this is a URN • URN type - indicates that it’s an LSID-type urn • Authority- the authority who issued the LSID • Namespace- internal to that authority • Object identifier - within that authority • Version - optional
LSID Rules • Data doesn’t change (byte identical) • Always available for resolution • Hand over to another authority if necessary • At least some basic metadata
Pros of LSIDs • Not tied to physical addresses (as URLs are) • Comparison can be done without resolving the ID – eg for cases like “does object a = object b” • Do not require any central registration or central service • Quick to adopt • Encourage thought and planning before they are allocated
Cons of LSIDs However … • Requires DNS SRV record • Requires specialised software to resolve an LSID (not built in to most software) • The restriction - “LSID data cannot change” can be difficult
How • What data/objects to apply Ids to • Decide on • Authority • Namespace • Local ids (new vs existing) • Issue LSIDs • Setup resolver
LSID Code • Current Code Stacks • Open Source (sourceforge.net) • Java, C++, Perl (IBM) • Microsoft .NET (Myself) • TAPIR LSID configuration
LSID Tools • IBM LSID Launchpad • Firefox LSID Browser • LSID Tester (Rod Page) • Web based resolver – http://lsid.tdwg.org/http://lsid.tdwg.org/urn:lsid... to get LSID metadata http://lsid.tdwg.org/summary/urn:lsid... to get summary info of LSID object • Example LSID servers: • Index Fungorum - urn:lsid:indexfungorum.org:names:213649 • IPNI – urn:lsid:ipni.org:names:30000959-2:1.1.2.1 • uBio - urn:lsid:ubio.org:namebank:11815
Issues to think about • Who assigns new LSIDs? • Who maintains LSID resolvers? • What to assign LSIDs to: • Physical or Digital • Granularity • Only objects that need to be resolved / identified externally • Is there any data, or only metadata?
Issues to think about • When to resolve LSIDs • Every time an LSID is encountered, or only when a client requests it? • TDWG standards for metadata • Which ones? • Consistent application
References • LSID Source Forge - http://lsids.sourceforge.net/ • LSID .NET Source Forge - http://sourceforge.net/projects/lsid-dotnet • LSID Tutorial - http://www-128.ibm.com/developerworks/opensource/library/os-lsid/ • LSID Specification - http://www.omg.org/cgi-bin/doc?dtc/04-05-01 • LSID Tester - http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/ • LSID Launchpad - http://www-124.ibm.com/developerworks/downloads/detail.php?group_id=124&what=rele&id=553 • GUID Subgroup - http://www.tdwg.org/activities/guid/ • GUID Subgroup Reports • http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report&show_comments=1 • http://wiki.tdwg.org/twiki/pub/TIP/TipDocuments/GUID1Report.pdf • Firefox LSID developer site - http://lsid.mozdev.org/