200 likes | 309 Views
Persistent identifiers: the 7 levels of identification. Juha Hakala Helsinki University Library ELAG 2005 1-3 June 2005, CERN. Persistence?. Is not dependent on the identifier itself, but on legal, organisational and technical infrastructure
E N D
Persistent identifiers:the 7 levels of identification Juha Hakala Helsinki University Library ELAG 2005 1-3 June 2005, CERN
Persistence? • Is not dependent on the identifier itself, but on legal, organisational and technical infrastructure • ISSN would collapse without the ISSN standard, a community using it according to the generally accepted principles, ISSN International Centre governing the system and the ISSN database linking the non-semantic (that is, dumb) identifiers to serials • Even a technically brilliant system may be discontinued if its mission breaks apart
”Normal” identifiers and resolution services • Resolution services are a new brand of identifiers which render traditional identifier systems actionable in the Internet (Web) environment • Resolve: provide a link from reference to the resource • Prime examples: DOI and URN • Both may encompass, at least in principle, any existing identifier (URN namespaces have been defined for e.g. ISSN and ISBN) • Both are useless without an existing identifier adding flesh to the DOI/URN bones • From now on, only ”normal” identifiers will be discusses • Complex enough topic for 35 minutes…
Seven levels of identifiers • After the collapse of integrated library system paradigm, and implementation of IR portals, digital asset management systems, digital archives, e-resource management systems, what do we need to identify? • This can be analysed from top to bottom, from organisations to search attributes • Such analysis may show gaps and help in design of identifier systems
Top level: libraries • Identifier system must cover at least other (memory) organisations • National level (union catalogue codes) exists; due to the Internet / Web it became necessary to develop an international system • ISIL, International Standard Identifier for Libraries and Related Organisations; ISO 15511 • Consists of ISO country code, hyphen and UC code • FI-H (Helsinki University Library) • Danish Library Authority hosts the ISIL IC; national centres have been established in some countries but the system needs wider acceptance
2nd level: collections and services • These identifiers are important for IR portals; international exchange of collection & service (e.g. a Z39.50 server) metadata is cumbersome unless there is an efficient means for duplicate control • These identifiers do not exist yet • Helsinki University Library is writing a New Work Item proposal for ISO TC 46 on ISCI; International Standard Collection Identifier • No on-going efforts to develop service ID
ISCI: design principles • Will be based on ISIL in order to allow efficient decentralization of the ISCI assignment and creation of Internet-wide resolution service without a global ISCI DB • Will consist of three parts: ISIL, delimiting character (colon) and the actual (colon-less) collection identifier • FI-H:Slavica (Slavic collection in HUL) • Need for an international support center?
3rd level: authors • International exchange of authority records can be made more efficient with persistent and unique identification • ISADN, International Standard Authority Data Number, has been discussed for quite a few years, but it is not yet formally under development • Retrospective assignment may create interesting ”ownership” problems, especially if the future ISADN contains country of origin • Is Franz Liszt German or Hungarian?
4rd level: identifiers for works • ISWC: International Standard Musical Work Code • T-345246800-1 • Letter T, 9-digit unique number and check digit • ISAN: International Standard Audiovisual Number • ISAN 006A-15FA-002B-C95F-A • 12-digit root segment + 4-digit segment for episode identification and check digit • ISTC: International Standard Text Code • ISTC OA9 2005 12B4A105 6 • agency code, year, work element & check digit • These systems were developed at the same time, but their syntax and terminology used varies • This should not complicate usage too much
ISTC/ISWC/ISAN issues • Many library system vendors are investigating the possibility of implementing FRBR, but few have been capable of doing it (VTLS, OCLC) • Once an ILMS is frbrized, implementing work identifiers is essential, but there is more than technology to consider here: • Do we need to pay for these identifiers; even when retrospectively generating them for old works? • Who will establish the national centers and create the identifiers (and work level records they require)?
5th level: manifestations • This used to be familiar terrain for us • ISBN, ISSN, NBN belong here • E-publishing has destroyed the old status quo: • Systems that worked well for decades have adaptation problems for different reasons • It is not yet entirely clear if the revisions done (or planned) are sufficient
E-problems with manifestations • It is increasingly difficult to define valid ”targets” • ISSN could be assigned to any Web site out there • Publishers want to give ISBNs to anything that can in principle be sold separately (e-book chapters, images within a book, teddy bears on sale in book stores) • The number of things to be identified is growing fast; this will cause syntax problems (ISBN revision was done to make more room) and staff issues in ISSN/ISBN national centers • There is no point to give a persistent identifier to a non-persistent resource; therefore resources must be identified, described & archived which is labour-intensive process
Case ISBN • The old ISBN was running out of number space • Several extension options were discussed: • 13, 16, even 32-digit ISBNs • The idea to make ISBN a ”dumb” number such as ISSN was voted down (for this the librarians in the WG are to blame) • The new ISBN will be compliant with the EAN system • 13 digits, starting with 978, 979 or in the future with something else to extend the scope of the system further • New check digit calculation algorithm adopted from EAN • It is possible to convert from an old ISBN to the new (starting with 978) and back • Publishers retroconvert to new ISBNs; libraries will keep the old ones • ILMS need to do sophisticated things with old/new ISBNs
6th level: component parts • Libraries have not done too well in this area in the past due to staff limitations • We catalogue serials but not the articles • E-publishing may force us to change tactics since now even component parts are separate items accessible directly • Manual processing must be partially or fully be replaced by automated processes; this will also have an impact on identifiers • Automated ID generation solves the staff bottleneck
SICI: still alive, but not kicking • Serial Item and Component Identifier, 1991- • NISO standard; has never really taken off • Can be generated programmatically provided that the article is structured enough • 0095-4403(199502/03)21:3<>1.0.TX;2-Y • Complex; consists of ISSN and stuff identifying the issue and article within it • Publishers have their own systems like PII which have been easier to create and maintain (for them) • Still not clear how popular SICI will eventually be
BICI: Dead On Arrival, or conflict between theory and practice • Book Item and Contribution Identifier • NISO draft standard, never completed • Consists of ISBN and extra stuff to identify the relevant section within the book; may be automatically generated • Publishers & book stores prefer to rely solely on ISBN in their systems • Using ISBN only is not a neat solution (uses a lot of ISBNs, and giving ISBN both for the thing as a whole and its component parts is messy)
7th level: search attributes etc. • Within Z39.50, sets (e.g. attribute and diagnostic), record syntaxes etc. are identified by ISO Object Identifiers • MARC21: 1.2.840.10003.5.10 • Bib-1: 1.2.840.10003.3.1; term examples: • Author: 1.2.840.10003.3.1.1.1003 • Name: 1.2.840.10003.3.1.1.1002 • Author-name personal: 1.2.840.10003.3.1.1.1004 • Personal name: 1.2.840.10003.3.1.1.1
OID problems • Bib-1 attribute set is not quite as coherent as it should be, there are lots of (domestic) search attributes missing from it, and sometimes there are too many alternatives • Attempt to develop Bib-2 failed, and even if we succeed in the future, co-existence of Bib-1 and Bib-2 may cause trouble • ISO OIDs can be applied to anything • Not clear how to use them in ”bibliographic context” to e.g. identify government publications or parts of them; this is currently being investigated in Finland
Conclusion • E-publishing and new applications (and their novel metadata) have expanded both the scope of identifiers needed and the requirements towards existing systems, especially on manifestation & component parts levels • Standards developers have reacted to these needs, but the progress has been slow; still, on some areas system builders have been even more slow
Conclusion (2) • Identifier is more than just a string of characters • There must be an agent which assigns the identifier to a resource, and (usually) describes it • As long as all parts in this picture are stable, identification is a routine process • Agent breakdowns have been the most common reason for problems in the past • Number of national ISSN agencies are non-active • E-resources have destroyed the balance, and it may take a while before the identification system works again in ”business as usual” style