120 likes | 228 Views
Persistent identifiers – an Overview. Juha Hakala The National Library of Finland 2011-02-01. Traditional identifiers.
E N D
Persistentidentifiers – an Overview Juha Hakala The National Library of Finland 2011-02-01
Traditional identifiers • Traditional (bibliographic) identifiers are systems like ISBN (International Standard Book Number) which provide unique and persistent identification for certain types of resources (books, serials, etc.) • They were designed for printed resources before the Internet was invented; thus the match with the digital resources and the Web may be a forced one • These identifiers are well established international standards with relatively clear roles • Not always clear how to apply them to the e-resources, except that identified resources themselves should be persistent
Persistent identifiers (PIDs) • A new category of identifiers which are actionable in the Internet, that is, they enable persistent linking (resolution) to the resource or a surrogate such as a bibliographic description of the resource • Most PIDs are also “traditional” identifiers • When using a DOI, one can identify a book with DOI & an embedded ISBN or DOI with a local ID string • URN is the only exception from this; URNs must include a traditional identifier • URN namespaces inherit the rules of the traditional identifier used; there is no need to discuss the scope of the URN itself
Traditional versus persistent identifiers • Assigning a traditionalidentifiersuch as ISBN is (shouldbe?) a controlledprocesswithpreciserules • What is identified, bywhom • Assigning a PID such as ARK mayormaynotbe a controlledprocess and the rules of applicationmaybevague • Sometimes the rulesaredifferent: • A bookmusthave just one ISBN, butitmayhavetwoPIDs(for instance, ARK and DOI) • The National Library of Finland usesHandles in itsDspacesystem, but URN is the ”official” identifier of theseresources
Recommendations • Conflicts between the two identifier groups should be avoided at all cost • If a traditional identifier can be assigned to the resource, use that identifier as a part of the PID • It follows that PIDs that cannot (easily) incorporate traditional identifiers may cause problems • Any identifier (traditional / PID) should have explicit implementation guidelines • If no general guidelines exist rules must be developed locally; such rules should eventually be aligned in the level of the PID community
Persistent identifiers and the Web: Cool URIs • From the library point of view, cool URIs (URLs) are not proper identifiers at all • The same resource may be available from many URLs • Over time, different resources or variant versions of the sameresource may be available in the same URI • There is absolutely no control over cool URI assignment • A user cannot know if a URI is cool or not (most of them aren’t) • Instead, cool URIs are just shelf marks • What is a realistic time frame for cool URI persistence? • Cool URIs can support only resolution; persistent identifiers can be more versatile in this respect • Match with the current / future long term preservation systems
Services provided by PIDs • Basic question: whatservicesdoweneed? • Someexamples: • Findalllocations (URLs) related to the PID • Findbibliographic metadata related to the PID • Retrievethe preservationcommitment of the owningorganization (concerning the resource at hand) • Thereis no overallframework/ contextwithinwhich to designthe resolutionservices • Each PID provides a slightlydifferent set
PID –based services in the future • Theoreticalbasiscouldbetwofold: • Functionalrequirements for bibliographicrecords (FRBR) –model: work, expression, manifestation • Currenttheory and practice of long-termpreservationbased on the migrationstrategy (and a long tail of manifestations for eachwork) • Thismeansitmustbepossiblefor instance to: • Findallworksrelatedto the work at hand • Findallexpressionsrelated to the work at hand • Findallmanifestations of the work at hand • Find out differencesbetweenthesemanifestations
PID–based services in the future (2) • It should also be possible to • Find out who is preserving the resource • Retrieve the rights metadata related to the resource • Retrieve the preservation metadata related to the resource • Retrieve the most original version (the eldest preserved manifestation) of the resource • Retrieve the latest (and supposedly the easiest to use) manifestation of the resource • …
Example: qualitative social scientific data set • The workitselfshouldbedescribed; one metadata elementshouldbe the PID • Expressions (translations to otherlanguages) shouldhavetheirownPIDs, linked to the worklevelrecord • Theremaybemultiplemanifestations (relationaldatabase, Excel table, etc.) of eachexpression; eachoneshouldhaveitsown PID, and thereshouldbelinks to the work / expressions • In thisenvironment, itwouldmakesense to providelinks to the work, and let the users to choose the mostappropriatemanifestation • Choice of the language, fileformat, etc.
Recommendations (2) • Services supportedby PID systemsneed a facelift • Manysystemsweredesigned 10+ yearsago, whendigitalobject management systemswerestill in theirinfancy • Upgradesmustbedone in a non-destructivemanner (existingimplementationsmustbecompliantwith the new version) • Allaspects of PID systemsshouldbestandardized • SomePIDs(e.g. ARK and PURL) haveneverreached a standardstatus, and at bestonlyonepart of the system (identifiersyntax) hasbeenpublished as a standard • More(and better) opensourceimplementationsareneeded
Conclusion • TherewillbemultiplePIDs in existencein the future(just liketherearenow) • Once a systemhasbeenchosen, youcannotgiveitup • PID supporters and cool URI proponentswillmostlikelycontinuetalkingpastoneanother for quitesometime, but: • Given the timeframe the national libraries& archivesmustpreserveresources (centuries) and the technicalcomplexity of thistask, coolURIsfallshort of the requirements in severalways; instead, PIDsmustbeused • PID systemsare to someextent ”work in progress”