1 / 25

An Overview of Persistent Identifiers

An Overview of Persistent Identifiers. George M. Garrity Microbiology and Molecular Genetics Michigan State University. The phone call from Peru…. To provide the TEG with an overview of persistent identifiers and digital objects Explore both the technical and social/policy issues

Download Presentation

An Overview of Persistent Identifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Overview of Persistent Identifiers George M. Garrity Microbiology and Molecular Genetics Michigan State University

  2. The phone call from Peru… • To provide the TEG with an overview of persistent identifiers and digital objects • Explore both the technical and social/policy issues • Provide some perspective on how persistent identifiers have been applied in two settings • Mature application - CrossRef • Evolving application - NamesforLife • Offer some thoughts on how PIDs might be applied to Certificates of Origin and Traditional Knowledge My assignment Disclaimers An end-user of persistent identifiers Dual interests and IP in this space

  3. So, what’s the problem? • “…link heterogeneous electronic libraries. • The difficulties inherent in this third objective ultimately led to this paper. ” Kahn and Wilensky 1993 “But for the bioinformatician concerned with integrating and computing upon distributed information… In second place is perhaps naming (identifying), with all the gloriously idiosyncratic embedded semantics of local identifiers in disparate forms.” Clark 2003

  4. So, what’s the problem? • “Even well-formed and properly applied names can serve as a source of confusion and considerable frustration. This is hardly a new problem.” Garrity and Lyons 2003 Report of the NISO Identifiers Round- Table 2006 “Although used every day, identifiers are a mystery to many people, including people responsible for building complex information systems.” McComb 2006 “And now, a much more succinct way to say this: our systems are autistic. They don’t make inferences. When we learn something in one system or one area, it doesn’t carry over to other areas.”

  5. Let’s start with some working definitions Digital objects • An instance of an abstract data type that has two components: metadata and key metadata • Key metadata includes a handle • A handle is a globally unique identifier that is bound to the digital object • Digital objects • differ from database records and files, • are stored in network accessible repositories, • and are accessed using a repository access protocol. • Other key properties From: Kahn and Wilenski 2006 Int J. Digit. Lib 6: 115-223

  6. Essential elements in • Human - machine communications • Machine - machine communications Identifiers Ideally… • Exist as an unambiguous string • Context and application dependent • Actionable • Resolvable • Other points to consider • Semantically opaque • Global or local • Unique or non-unique • Unanticipated uses

  7. Definitions (continued) • A name or an identifier for a resource that uniquely identifies that resource and will be forever associated with that resource. It will never be reassigned to any other resource and will not change regardless of where the resource is located or whatever protocol is used to access it. • Use of a well managed persistent identifier rather than a location will ensure that when a document is moved, or its ownership changes, the links to it will remain actionable. Persistent Identifiers From: Diana Dack, Persistence is a Virtue Information Online Conference, Sydney. January 2001

  8. PID URL PID1 PID2 PID3 URL1 URL2 URL3 Locates Identifies Name resolution Resource Definitions (continued) Name resolution The process of mapping a persistent identifier to a URL that retrieves a resource. The URL locates the named resource identified by the persistent identifier (the name).

  9. Global registry User PID URL PID1 PID2 PID3 URL1 URL2 URL3 Locates Identifies Resource Metadata PID URL Key metadata Inherent in the design of such systems…. Name registration & Name resolution Authority

  10. URL DOI URL DOI URL DOI DOI DOI URL URL URL URL URL DOI DOI DOI URL DOI URL DOI URL DOI URL DOI URL DOI URL DOI doi> doi> doi> Assigner Content DOI directory DOI directory DOI directory Content Source: Norman Paskin, International DOI Foundation

  11. Comparing identifiers A single unambiguous string • A label that identifies an entity • ISBN 0-387-98771-1 • ATCC 27126* • L-681,572-001 A numbering scheme A method of providing consistent syntax to denote class membership of an entity. A formal standard or industry convention An arbitrary internal system Key point is establishing a 1:1 correspondence between labels and members Enumeration The number or label are simply strings

  12. Comparing identifiers (cont.) • A syntax by which an identifier can be expressed in a form suitable for use within a specific infrastructure. • Actionable identifiers • URI (URN and URL) • ISBN numbers as UPC/EAN identifiers • Does not mandate a method of creating labels • Does not create a managed environment An infrastructure specification

  13. Comparing identifiers (cont.) A fully implemented identifier system • Includes • Unique identifiers • A formalized infrastructure • Management policies for registration, structured interoperable metadata, policy, and governance mechanisms. • Examples • UPC/EAN barcodes and RFID tags • Digital object identifiers (digital identifiers of objects)

  14. Desired properties of a candidate PID • Semantically opaque - avoid the pitfalls of embedded meaning • Governance - is there a technical and social framework overseeing the development, implementation and “marketing’ of the PID? • Persistence - is there a mechanism in place to guarantee persistence of issued PIDs, when so desired? • Registration - is there a mechanism for global registration of the PIDs or can anyone issue PIDs? • Metadata - is there a minimal requirement for metadata associated with each identified object? • Accepted standard - is there evidence that the PID is an accepted standard? • Globally unique - are the PIDs globally unique? • Widespread usage - how many PIDs have been issued and what is the rate of issuance of new PIDs?

  15. Desired properties of a candidate PID (cont) • Object/location - what does the PID identify? • Actionable - are network services attached/imbedded? • Unique - does the resolution service check for uniqueness at the local level? • Interoperability - can the identifiers be readily incorporated into other applications without modification or permission? • Granularity - can the identifiers be assigned to subcomponents (nesting of entities within entities). • Business model - is there a compelling business need for the PIDs to insure that the infrastructure can be maintained in a self-supporting manner?

  16. Comparison of identifier properties

  17. What does a Digital Object Identifier look like? 10.1234/myownnumbers-123.00001 prefix subsuffix suffix The prefix is assigned to the content provider by a DOI Registration Agency, or the Handle System directly. The suffix is an opaque string supplied by the content provider. Handle software stores a mapping of the Handle to one or more locations (or services) In virtually all cases today, the Handle is mapped to a location (URL). http://dx.doi.org/10.1007/bergeysoutline resolves to http://141.150.157.80/bergeysoutline/main.htm Which used to be: http://www.springer-ny.com/bergeysoutline

  18. Persistent URLs Life Science Identifiers LSID Syntax of some other PIDs in “common” use • <Handle>::=<Handle Prefix> "/"<Handle Suffix> • http://hdl.handle.net/10.1099/ijs.0.64483-0 <purl>::=<protocol>/<resolver>/<name> http://purl.oclc.org/OCLC/OCLC/PURL/FAQ urn:<LSID>:<AuthorityID>:<Namespace>:<Object>:<Rev> http://lsid.biopathways.org/resolver/data/urn:LSID:ncbi.nlm.nih.gov:GenBank/accession:NT_001063:2

  19. Two implementations using DOIs Independent membership association,founded and directed by STM publishers. Mission is to connect users to primary research literature through a DOI RA that performs reference cross-linking, subject to publisher-access controls. The largest and most successful implementation of DOI services. NamesforLife is a proprietary semantic resolution service developed at MSU. It provides a method for persistently linking the occurrence of a biological name or other technical term in third party content to managed information about its origins, formal definition, current usage, and related goods and services.

  20. “…because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.” Rumsfeld’s axiom and knowledge bleed

  21. The knowledge gradient Unknown unknowns Unknown knowns Known unknowns Known knowns Semantic resolution provides a mechanism to combat knowledge bleed Knowledge bleed results is a loss of knowledge that has already been gained Basic and applied research advances knowledge

  22. Ramifications of misunderstanding a name • Wrong assumptions, assertions, or hypotheses • Misdiagnosis of infectious diseases • Misapplication of public policies Highly significant • Lost opportunities • Failure to reach potential customers potentially interested in marketed content, goods, and services at point of need. • The long-tail phenomenon* Significant • But, the concepts to which names apply are not static • May not always map 1:1 • May require expertise for accurate interpretation Names trigger specific responses

  23. Some thoughts on selecting a PID for CO and TK The intended use of the identifier Syntactic rules governing the form of the identifier What the identifier resolves to The technical infrastructure that is available to support the identifier and the parties operating it Policies governing creation, maintence, support, and persistence of the identifier Information about any metadata related to the identifier that is or must be made available A history about the identifier, including any changes in any of the above points over time. Questions? Source: Report of the NISO Identifiers Roundtable 2006

More Related