200 likes | 287 Views
r fc2141bis, rfc3406bis and the ISBN + NBN namespace s. IETF 83, Paris, France Juha Hakala The National Library of Finland. The need for modernization.
E N D
rfc2141bis, rfc3406bis and the ISBN + NBN namespaces IETF 83, Paris, France Juha Hakala The National Library of Finland
The need for modernization • RFC 2141 wasadopted in 1997. It is based on originalspecification of URLs (RFC1808) and thereforedoesnotuse <fragment> and <query> • Other PID systems (Handle, ARK) aresimilar in thisrespect • RFC 3406 doesnotconform to RFC 5226 (IANA proceduresdocument), and revision of 2141 willhave an impact on namespace definition procedures as well.
Conserningnamespaceregistrations • The changes made to RFC 2141 and 3406 up to nowdonotnecessitatere-registration of existingnamespaces; wehave to reviseRFCs 3044 & 3187 (ISSN and ISBN) because the identifierstandardschangedsubstantially • However, rfc3188 (NBN) revision processwasstartedbecause the national librarieswant to useall the functionality the new syntaxwilloffer
URN syntax (2141bis, version 02) • Conformsfully to RFC 3986 • Adding <fragment> supportwasnon-trivial, sincethereweremanythings to consider: • RFC 3986 requirements and the way in which Web browsersusefragments (theydonotpassfragments to the server, butusethem ”internally” to identifypositionswithinretrieveddocuments) • Varyingpractices in differentnamespaces • The outcomewas a multi-tieredsolutionwhere RFC 3986 is alwaysfollowed, butnamespacesmayhavetheirinternalsolutions to fragmentidentification
URN syntax (2) • The role of <query> is restricted to indicate the requested URN resolutionservice and (possibly) parameters of thatservice • For instance, retrievedescriptive metadata about the resource in a particularformatsuch as Dublin Coreor MARC (usedbylibraries) • Character set hasbeenalignedslightly (to align the textwith RFC 3986); namespaceidentifier (NID) syntaxwasdiscussed in moredetailsbut the issue is nowsettled – wewilltrust on common sense of IANA experts and peoplewritingnamespaceregistrationrequests
Remainingissues • In general, version 02 of rfc2141bis is a maturedocument • Since the draftbuildsupon RFC 2141 and RFC 3986, therewerefewopenissues to startwith, and nothingthatwouldhavebeenhighlycontroversial, politicallyortechnically • PracticalexperiencefromusingURNs (tens of millionshavebeenassigned) hasnotrevealedany design flaws in the syntax
Remainingissues (2) • In order to prepare the draft for publication, wemaywant to: • Align the statementsconcerning the URN scope in differentparts of the document. Introductionsaysthat URN doesnothave a specificscopesinceitsscope is the sum of the scopes of the namespaces; 7.1 claimsthatURNsserve as resourceidentifiers for concrete and abstractobjectsthathavenetworkaccessibleinstancesand/or metadata • Use the termresourcewhenreferring to what is beingidentified (insteadorobject, document, artefact etc.)
Remainingissues (3) • Functionalequivalence • Notproperlyspecified in 2141bis; options: • TwoURNswithin the samenamespaceresolve to the sameinstance of a resource; thisshouldnothappen • TwoURNswithin the samenamespaceresolve to differentinstances of a resource; this is OK in somenamespaces (butnot in all of them; seee.g. rfc3187bis and rfc3188bis) • TwoURNsfromdifferentnamespacesresolve to sameordifferentinstances of a resource; this is OK • TwoURNsresolve to the sameresource in differentlevels (work, manifestation, fragment of a manifestation); this is OK • Existingnamespaceregistrationsdonotdiscussfunctionalequivalence; in mostnamespacesthis is notnecessarysincee.g. twoURN:ISBNsshouldnotbefunctionallyequivalent (however, RFC3188bis willdiscussthis)
rfc3406bis • The aim is to outline a mechanism and provide a template for URN namespace definition • Thereare 40+ URN namespaces; the level of use and control of usevaries a lot • Tens of millions of URN:NBNshavebeenassigned, makingit the mostpopularbibliographicidentifierever; someothernamespacesare ”dead” • Standard-basednamespacesarestrictlycontrolled as regardsidentifierassignment; there is virtually no control in someothernamespacessuch as URN:UUID
URN namespace definition mechanisms, version 02 • Takes into accountboth the new features in rfc2141bis and the experiencesgainedsofarfrom the namespaceregistrationprocesses • Therehasbeen no difficultissues, but the factthat RFC 2483 is out of datedoeshave an impact on rfc3406bis as well • There is a need to specifywhichservicesmust / shouldbesupported in a namespace; it is hard to dothiswhensomeservicesaremissingorlackessentialfunctionality
Remainingissues • Like rfc2141bis, 3406bis is muchmoredetailedthan the RFC it is based on, due to the understandinggainedsince the URN systemwasestablished • Apartfrom the problemsrelated to servicespecification, therearefewopenissues to discuss (as reflectedby the lack of discussion on the URN-WG list) • IMHO the mostvitalissue is a practicalone: howcanwemake sure that the IANA expertsapprove of onlythosenamespaceregistrationsthatdeserveit, and howcan rfc3406bis supporttheirwork? • A badlymanagednamespaceundermines the value of the URN system as a whole • Overlapbetweennamespaces is inevitable, butshouldbeavoidedif and whenpossible
rfc3188bis:general • National BibliographyNumberis not a standardidentifier, but a set of identifiersystemsused (primarily) by the national libraries, following the localpractices and needs • NBNsused to belocalidentifiers, butusingthem as URNsrendersthemgloballyunique and actionable in the Internet • The namespacehasbeen in productionuseover a decade; tens of millions of identifiershavebeenassigned in severalcountriesprimarily in Europe • Digitizedcontents, harvested Web documents, e-deposit; generallymaterialsthat a) donotqualify for a truestandardidentifier, and b) is preservedlong-term
NBN syntax & semantics • Every NBN stringhassomeembeddedmeaning • URN:NBN consists of • ISO 3166-1 twoletter country code • URN:NBN:FI = Finland • Sub-divisionelement (voluntary); the National Librarymustmaintain a registry of these • URN:NBN:FI:STAT = Statistics Finland • Publicationelement • Beyond the requirements of URI/URN syntaxspecifications, thereare no additionalrequirements for thissection
URN:NBN and fragments • NBN canbeused to identify a fragment of a publication (section, chapter) • Therewillnotbe a namespacespecificinternalmethod for fragmentidentification; instead • Physicalfragmentsmaybeidentifiedusingthe RFC 3986 procedure; thiswillproducestandardbrowserfunctionality (the entireresource is retrieved) • Logicalfragmentsmaybeidentifiedby ”normal” NBNs; in this case the result (e.g. a journalarticle) maynotbe a physicalfragmentbut a completefile • Logicalfragmentsmayalsobeidentifiedby a localfragmentsyntax (to berecognizedby the relevantresolvers)
rfc3188bis: status and plans • Underdevelopmentsince 2010, first as a privatecontribution, then as the WG deliverable • The textis mature as regards the syntax, butscope and functionalequivalencecould / shouldbediscussed in moredetails • Iftwo national librariesharvest the same resource3 into theirwebarchives, theymayassigndifferentURN:NBNs to it • This is not a problem, sincetheseURNswillresolve to differentphysicalcopies of the resource
rfc3187bis: about ISBN • An ISO standard, established in early 70’s • Persistent and uniqueidentifier for books • Eachmanifestation (hardcover, soft cover, PDF, ePUB) getsitsown ISBN • In theory the systemhasspreadalmosteverywhere; in practice, thereare a lot of countrieswhere ISBN assignment is notworking (properly / at all) • Therearetwovariants, ISBN-10 (up to 2006) and ISBN-13, specified in 2005 and usedsince2007 • Examples • 978-0-395-36341-6 (ISBN-13) • 951-0-18435-7 (ISBN-10) • Syntacticaldifferencesare ”978” or ”979” in the beginning and the checksumcalculationalgorithm, which is compliantwith EAN in ISBN-13
Resolution of URN:ISBNs • ISBN is ”semantic” (non-dumb) identifier: • 978 = Prefixelement (EAN ”bookland” code; also 979) • 978-0 = Registrationgroupelement (for Englishlanguage; also978-1) • 395 = Registrantelement (Publisher ID) • 36341 = Publicationelement • 6 = Checkdigit • There is no single pointwhereallISBNscouldberesolved (note the differencewith the ISSN ), so URN:ISBN mustcontain a hint of where to findresolver • Thishint is the registrationgroupelement; in somecasesitprovides a goodhint (951 = Finland), butoccasionallyit is lessuseful (3 = Germany, Austria and German-speakingparts of Switzerland
rfc3187bis version 02 • The currentdraft is (relatively) mature • Namespaceregistrationrequesthasbeenextendedsothatittakes into accountboth ISBN-10 and ISBN-13 • Fragmentusagehasbeenspecified • CompleteISBNscanbeassigned to logicalfragments of a book, butit is notpossible to addanything to the identifierstring to indicate a fragment, either in the spirit of RFC 3986 orotherwise
rfc3187bis: status and plans • Includediscussion on functionalequivalence • TwodifferentISBNsshouldneverresolve to the samething (e.g. a manifestation of a book) • TwoISBNsmayresolve to differentmanifestations of the samework (and beinterconnected via the worklevel metadata) • TwoISBNsmayresolve to the samemanifestation of a book on differentlevels (an entirebook / a single chapterwithin the book)
rfc3187bis: status and plans (2) • Indicatewhichresolutionservicesarenecessary in the URN namespace • For instance: retrievedescriptive / administrative metadata; fetch the resourceor a list of locations; retrieve metadata about the work and relatedmanifestations of the work • Polish the language • Make sure that just the terms ”resource” or ”book” areused • Removeremainingoccurrences RFC 2119 termsnotwritten in capital lettersso as to avoidconfusion