230 likes | 316 Views
Globally Unique Identifiers: What, why, when, which and what now?. Dave Thau University of Kansas thau@learningsite.com. WHAT? Why? When? Which? What now? (is a GUID?). GUIDs in the World. More GUIDs. Patent numbers: 5443036 (laser guided cat exercise)
E N D
Globally Unique Identifiers:What, why, when, which and what now? Dave Thau University of Kansas thau@learningsite.com
WHAT? Why? When? Which? What now? (is a GUID?)
More GUIDs • Patent numbers: 5443036 (laser guided cat exercise) • GenBank accession numbers: AP006480.1 • Digital Object Identifier: 10.121/3212 • Life Science ID: urn:lsid:pdb.org:1AFT:1
Common Features of GUIDs • A short name for a complex entity • Useful for locating information about the entity • Each name identifies only one entity • There is some sense of permanence
Differences Between GUIDs • Can an item have more than one GUID? • Patents, no • GenBank accession numbers, yes • Web URLs, YES • Is issuance of the GUIDs at all decentralized? • Patents, no • ISBNs, yes (publishers get a block) • LSIDs, YES (there’s no central control)
What? WHY? When? Which? What now? (do we want them for taxonomic concepts?)
IDs and the TES <TaxonConcept id=“1883/t3_17555" type="revision"> <Name type="scientific"> … </Name> <AccordingTo> … </AccordingTo> <Kingdom>Plantae</Kingdom> <Rank>Genus</Rank> <Relationships> <TaxonConcept ref="1883/t3_17661" type="congruence"/> <TaxonConcept ref="1883/t3_17657" type="congruence"/> <TaxonConcept ref="1883/t2416_17656" type="congruence"/> <TaxonConcept ref="1883/t3_17515" type="congruence"/> </Relationships> <TaxonConceptCircumscription type="complete"> <TaxonConcept ref="1883/t4_17555"/> <TaxonConcept ref="1883/t5_17555"/> <TaxonConcept ref="1883/t6_17555"/> </TaxonConceptCircumscription> </TaxonConcept>
Goals for a GUID • Useful internally for systems dealing with data objects (e.g. databases). • Useful for communicating between separate, unaffiliated systems which deal with data objects. • Integration with other communities • Typical GUID goals • Short • Permanent • Unique • Resolvable
Why not use… • Existent database IDs (e.g. IPNI, ITIS)? • Most are currently name based • Can’t guarantee permanence • Taxon_author_year_publication • Very long! • How do you represent publication? • How do you deal with non-ascii characters? • Just a name is not resolvable
What? Why? WHEN? Which? What now? (should a GUID be assigned?)
What gets a GUID? • Taxonomic Concepts • Publications • Specimens • Data Providers? • Authors? • Journals?
When is a GUID assigned? • When a “new” concept is added • How do you define concept for the system? • When is a concept new enough to get a new GUID? • What minor changes are allowable?
Examples of a New Concept • A revision adds a new species to a genus • The species is a new concept • So is the genus • A revision adds a synonym to a taxon • A flora misspells a scientific name
Do These Get New Concepts? • Page numbers in the reference are wrong • The journal title is misspelled • The author is misspelled • Solution: Give the data provider the choice and trust!
What? Why? When? WHICH? What now? (kind of GUID?)
Candidates • Home Grown Web Service • t3_17555 • GRID resource locator • ecogrid://ku.edu/tcs/t3_17555 • LSID • urn:lsid:ku.edu:tcs:t3_17555 • Handle System • 1883/t3_17555
LSID • urn:lsid:pdb.org:1AFT:1 • Backed by IBM, HP, UCSD, Chiron, U. Manchester, to name a few • Uses web services protocols • Support for caching, authentication, and metadata about the service • Completely decentralized • Specification under 2 years old
The Handle System • 1883/ t3_17555 • Underlies DOIs • Many journals use them (Nature, ACM) • More than 11 million DOIs are in use • Mature – specification 10 years old • Proprietary central system assigns and resolves prefixes • Doesn’t use internet standards
For Our First Prototype • Functionally, they’re very similar • We’re going with the Handle System • More mature • Better built in authentication methods • Easier to separate handles from issuers • More likely to be accepted by publishers
What? Why? When? Which? WHAT NOW?
Challenges and Questions • Specification and Implementation! • When is one concept different from another? • Can there be more than 1 GUID per concept? • What will encourage people to assign and utilize GUIDs?
Topics For Discussion • Are GUIDs really necessary? • Are there alternative GUID systems? • Is the “only 1 GUID per concept” rule necessary? • What will encourage people to use and assign GUIDs?