200 likes | 332 Views
Identification of Electronic Resources: identifiers and resolution services. Juha Hakala Helsinki University Library 2003-01-29. Background. Rapid growth of electronic publishing has revealed fundamental problems in our existing identifier systems
E N D
Identification of Electronic Resources: identifiers and resolution services Juha Hakala Helsinki University Library 2003-01-29
Background • Rapid growth of electronic publishing has revealed fundamental problems in our existing identifier systems • Rules of implementation and/or syntax must be changed, new systems developed, and resolution systems utilising the identifiers need to be built • Every resource must have an identifier!
Scope of the work • Identifiers for authors (ISADN) • Identifiers for works (ISTC, ISAN, …) • Identifiers for manifestations (ISBN,…) • Identifiers for component parts (SICI, BICI) • Resolution services (DOI, URN) • Always incorporate an identifier
ISSN • Capacity of the system is sufficient • 1.5 million out of 10 million IDs in use • Many open issues • How to implement ISBD(CR) in practice? • Staffing: nat. ISSN centres need more cataloguers • Web journals are not stable; 856 versus OpenURL • Data utilisation problem: the global ISSN database system requires modernisation • Faster updates from nat. centres, Z39.50 access on-line
ISBN • Capacity is an issue • We will run out of ISBN’s within a decade • Rule problem: publishers want to use ISBN also to component parts (not BICI) in order to simplify their systems • Utilisation problem: there is no global ISBN database (and may never be)
The New ISBN: current plans • ISO/CD 2108, dated 2003-01-17 • For ISO/TC 46 SC 9 /WG 4 meeting 30.-31.1.2003 • Bookland EAN prefix (978) will be added; otherwise the structure remains the same • ISBN-13 978-90-70002-34-3 • -> every old number can be re-used • Check digit calculated using Modulus 10 algorithm (Mod 11 in ISBN-10; 0-9 + X)
The New ISBN: some cancelled ideas • Make ISBN ISSN-like dumb number • Enhanced capacity, but reduced usefulness • Create the global ISBN database • Technically, organisationally and politically controversial idea • Instead, national centres will make their data available • Extend ISBN to 16/25/32 digits • Would have broken the EAN system
National Bibliography Number • Traditionally: the identifier for records in the national bibliography, if the publication did not have an identifier • New scope: identifier for (electronic) resources to which no other identifier applies • Implemented as URNs in order to guarantee global uniqueness
National Bibliography Number: examples • All implementations based on RFC 3188 • Finnish Web Archive (11.7 million files) • Machine generated ID based on MD-5 • urn:nbn:fi:fa<MD-5> • Koninklijke Bibliotheek’s E-depot • urn:nbn:nl:kb:eDepot-<UNIX time>
Uniform Resource Name • Internet standard; approved in fall 2002 • Both an identifier and resolution service (mechanism for linking identifier and resource in the Internet) • Designed to be protocol implement; the current version is built on top of DNS, but infrastructure can be changed
URN: syntax • Specified in RFC 2141 (1997) • Three sections, separated by commas • String urn • Namespace identifier (NID) • Namespace specific string • urn:nid:nss
URN: services • Supply the actual document • Deliver metadata related to the document • Pass the list of URLs from which the resource can be found
URN: namespace registration • Each namespace must be registered as specified in RFC 2611 • Registration must contain the proposed NID (such as “nbn”) and an outline of how the global URN resolver discovery service will function within the namespace • Registrations are approved as informational or normative RFC’s
Administration of the NBN namespace • Each national library is allowed to do whatever it wants with its own part of the NBN namespace (as long as the identifiers remain unique and persistent) • National Library of Finland has assigned some organisations their own sections • Library of Congress could do the same
URN: resolution process • Based on DNS; there is a resource record which describes the location of the service which can resolve a URNs with given NID/NSS combination • Complexity of the resolution process varies • ISSN – single database is enough • ISBN – databases of national centres will do • SICI – huge amount of a&I-services needed
URN: some benefits • No assignment cost • Trivial to create from existing identifiers • Add a fixed prefix, such as urn:nbn:us:cornell: • Internet standard; support will gradually be included into the basic tools we use • Present architecture for resolver discovery service is robust and scalable
URN: some problems • Someone must pay for the implementation of resolution services in e.g. ILSs • Commercial publishers prefer DOI • Only a handful of systems have registered namespaces • E.g. ISSN, ISBN, NBN • Dumb identifier with multiple resolution services does not fit into the system well (although there may be a cascade of resolvers)
URN versus DOI • DOI system is a technology, not a standard • Standardisation of DOI syntax is not enough; services and the practical implementation of the resolution mechanism must also be “fixed” • Handle system has failed to attract IETF • In DOI system, anything can be used as an identifier (suffix) • DOI requires registration of registrants (publishers)
URN versus DOI (2) • DOI syntax is mandated by the Handle system • Actual DOI implementations are dependent on HTTP protocol (which will not last forever) • Handle system may become (but is not yet truly) distributed • DOI has been widely implemented and works OK • Only one DOI service: retrieval of the resource
URN and ENCompass • Endeavor has no immediate plans to develop URN resolution service • I.e. mechanism for receiving URN resolution requests arriving via DNS • URNs can however be stored in metadata or into documents themselves, and indexed • This means that implementing URN RS should not be complicated (it was designed to be easy to develop)