Emerging Standards for Libraries and Publishers

Emerging Standards for Libraries and Publishers Cliff Morgan, John Wiley & Sons Ltd UKSG briefing session, 15-17 April 2002

What I’ll be covering • Identifiers • Metadata • E-books

What I won’t be covering • Graphics (e.g. JPEG, GIF, PNG, SVG) • Character sets (ASCII, Unicode) • Relationship models (RDF, Topic Maps/XTM) • E-commerce (UN/EDIFACT, XML-edi, ebXML) • XML stuff (Schemas, Xlink, XSL, XSLT, etc.) • Usage stats standards (e.g. COUNTER, ANSI/NISO Z39.7-1995) • Rights metadata (XrML, ODRL)

Identifiers • ISSN • ISBN • SICI • BICI • PII • DOI • ISTC • Multimedia identifiers

ISBN • International Standard Book Number • ISO 2108 • e.g. 0-471-92755-4 • Geog location/language - publisher/imprint - title (print format) - check character • Has been a standard for > 30 years

New ISBN • ISBN is being revised - 13 digits from 1/1/05 • Can double capacity by giving a 979 prefix • Issues: - hexadecimal or decimal? - limit ISBN to print - do something else for electronic? versions? formats? - assign to components (e.g. chaps)? - should number be completely dumb? - metadata deposit at assignment?

ISSN • International Standard Serial Number • ISO 3297 • e.g. 0749-503X • If publisher has not applied for an ISSN, any 3rd party can apply for their own data management needs • Different media get different ISSNs, e.g. print ISSNis different from CD-ROM ISSN

But different file formats don’t get different ISSNs, so offline is different from online, but PDF is same as HTML • If online contains only abstracts of print full text, no new ISSN for e-version • If use print and eISSNs, must change both if title changes • http://www.issn.org:8080/English/pub/getting-checking

SICI • Serial Item and Contribution Identifier • ANSI/NISO Z39.56-1996 - reaffirmed • e.g.issue=0749-503X(20010115)18:1<>1.0.TX;2-XArt. = 0749-503X(20010115)18:1<1:YGPIWG>2.0.TX;2-X (Check digits in above examples have not been calculated.) • Well used at issue level - bar codes • Less used at article level

SICIs at Article Level • Requires publication info - but publishers want to assign article Ids before pubn • Long-winded • Unfortunate syntax for Internet transfer (<>, #) - needs SGML entifying and hex encoding • Unclear what to do with special characters in Title Code • Not unique ID if two untitled articles on same page (e.g. Letters)

C = Contribution, not Component • SICI allows identification of article, issue ToC, issue Index and article abstract (DPIs of 0, 1, 2, 3 respectively) • No way of using SICI to identify any other component (such as Figure, Table, Section) • Not surprising since it’s a canonicalisation nightmare • http://sunsite.berkeley.edu/SICI/version2.html

BICI • Book Item and Component Identifier • ISO DSFTU (Draft Standard for Trial Use) • e.g. 0387119787(1982)<174:ADTATO>2.2.TX;1-Q • ISBN, date, location, title, component type, etc. • Trial was Aug 2000 to Jan 2002 - not much evidence of use • Many issues the same as for SICI, but also less business push

PII • Publisher Item Identifier • Proposed in 1995 by ACS, AIP, APS, IEEE and Elsevier, but never became a standard • e.g. S0749-503X011234 • Some publishers use as internal id since doesn’t suffer from any of the SICI problems • But no registration/maintenance agency

DOI • Digital Object Identifier • ANSI/NISO Z39.84-2000 • e.g. issue = 10.1002/yea.v18:1 article = 10.1002/yea.1234 • Well established in academic journals publishing - esp. ‘cos of CrossRef • 4.2 million DOIs deposited to date • http://www.doi.org

Some publishing issues regarding DOIs • What are they assigned to? • Need for matching URL, so can’t assign to anything you wouldn’t give a URL to • Individual publishers need to decide their DOI structure • Doesn’t have to be human-friendly but must be unique, easily generated, and matched with URL • Application profiles for different genres

Processes • Apply to Registration Agency (IDF, CDI, CrossRef, Enpia, LON) for Registrant Prefix • For individual DOIs, batch-process - generate DOIs and URLs from electronic metadata and send to RA for deposit • DOIs never change (even if journal changes ownership) but matched URLs (or other locators) can

ISTC • International Standard Textual Work Code • ISO Committee Draft 21047 - circulated Oct 01, voting finished Jan 02: progressed to Enquiry stage • http://www.nlc-bnc.ca/iso/tc46sc9/21047.htm • E.g. 0A9-2002-1223F332-0 (RA+year+WorkID+check) • A Work (= abstract creation) id - replaces the ISWC(L)

Creator-centric - authors may apply to ISTC Agency directly or via agents or via publisher • Requires metadata deposit too • Publishers therefore need to capture these numbers if they’ve been assigned to Works • Will authors really bother with this?

A couple of non-text, non-graphic Ids you might want to know about • ISAN • ISWC

ISAN • International Standard Audiovisual Number • ISO Draft International Standard 15706 • E.g. 153C-7365-B36F-844C-N • Can be issued to movies, trailers, TV programmes, episodes or series, ads, multimedia works if A/V component is significant • http://www.nlc-bnc.ca/iso/tc46sc9/isan.htm • Work has also started on a V-ISAN for Versions

ISWC • International Standard Musical Work Code (used to be ISWC(T)) • ISO 15707 • e.g. T-034524680-1 • Identifies any musical work, including arrangements, movements, medleys, samples • http://www.iswc.org/iswc/iswc/en/html/home.html

Metadata • Resource discovery (Dublin Core, OAI-PMH), incl. Linking (CrossRef) • Product metadata (ONIX and ONIX for Serials) • Preservation metadata (OAIS) • I am not going to talk about library-specific sets such as MARC, Z-3950, AACR2, etc.

Dublin Core • Defined Universal Bibliographic Language for Internet Navigation and Coherent Online Resource Exploration [not really!] • ANSI Z-3985 • DC 1.1 (simple, unqualified set of 15 elements) • Qualified set (DCQ? dcterms?) needed to do anything more than basic - not standard yet

DC has been mandated by UK Government (“e-GMS”) • Application Profiles will deal with defined local extensions via namespace declarations

OAI-PMH • Open Archives Initiative Protocol for Metadata Harvesting • Not really an archive in the sense of repository, more of a political statement and a metadata harvesting protocol • Came out of the E-print community, but they welcome commercial publishers • Supported by DLF and CNI • Uses simple (unqualified) Dublin Core as its metadata • E.g. <creator>Cliff Morgan</> • Version 2 of protocol due for release June 2002 • http://www.openarchives.org

CrossRef metadata set • CrossRef matches the metadata in a citation with the metadata in its Metadata Database (MDDB), which includes the DOI for the resource • Participating publishers (91 of ‘em) deposit the m/data with DOI into the MDDB • To date, 3.7M DOIs, covering 5000+ jnls • http://www.crossref.org

New version • Version 2 much more complicated - full schema is 113 pages long • In addition to journals, covers books and conference proceedings, at whole title and chapter level • Some element names are different from CrossRef 1.0

ONIX • OnLine Information eXchange • Latest release is 2.0 • Original focus was message format for books through the trade, but is fast becoming a universal metadata set for describing publications • http://www.editeur.org

ONIX being championed by a number of publishers and online retailers • Swedish Royal Library using ONIX as an input medium

ONIX for Serials • Provides rich cataloguing information for agents, librarians, users • Supports alerting, despatch and library check-in • Structured, multi-level bibliographic descriptions, including ToCs • Descriptions for library holdings (direct to OPACs)

Draft 2 just released this month • Subscription Package Record provides product catalogue info about subscription packages • Serial Title Record provides catalogue info about an individual serial • Serial Item Record provides structured multi-level bibliographic description of serial parts

So is the CrossRef set like the ONIX for Serials set? • No • They both include metadata that can be used to describe journals, issues and articles • But they don’t use the same element names • CrossRef has mapped to ONIX but not to ONIX for Serials yet - but has said will support when released

OpenURL • NISO Work Item • Separates metadata for resource from metadata for location • Resolver services (such as SFX, CrossRef) make the context-sensitive link • Solves the “appropriate copy” problem, where more than one legit copy of an article may be available to a library, e.g. local holding, consortium, aggregator service, mirror site, publisher

OpenURL metadata • OpenURL comprises BASEURL and QUERY • BASEURL identifies the resolver; QUERY is a resource description • e.g. (simplified): http://resolver.ukoln.ac.uk/genre=article &atitle=Information%20gateways:… &issn=14684527&volume=24&spage=40 &aulast=Heery&aufirst=Rachel

Genres defined as “referent-types”, such as book, chapter, journal, article, conf proc and paper, dissertation, patent, report - each has its own metadata spec • High-level concept is the Bison-Futé model http://www.dlib.org/dlib/july01/vandesompel/07vandesompel.html

Preservation metadata • OAIS (Open Archival Information System) underlies all digital preservation models • Nothing to do with OAI • Based on SIPs (Submission Info Packages), AIPs (Archival Info Packages) and DIPs (Dissemination Info Packages) • The Producer wraps the stuff up in a SIP, it gets ingested into an AIP, and sent out as a DIP

Some other metadata activities • LOM - Learning Object Model • IMS - Instructional Management Set (builds on LOM) • PRISM - Publishing Requirements for Industry Standard Metadata • MEG - cross-sectoral Metadata for Education Group • SCORM - Shared Contents Objects Reference Model - US DoD project, also builds on IMS/LOM

How are we supposed to cope with all these metadata sets? • A publisher’s metadata becomes an important asset for describing product to the outside world, esp. for trading and linking • If publishers have their publications in electronic form, the metadata will be in there in the file so it just needs extracting and mapping to whatever metadata set the publisher chooses • Production issue: who checks the metadata?

E-books • OEBPS - Open E-Book Publication Structure • Three components: a) XML DTD for content b) DC-based metadata (but some non- compliant qualifier attributes) c) description of package’s structure, reading order, navigation • Many OEB files are just (a) • Version 2 being worked on, esp. M&I, and Rights

Formats • Front runners are Adobe E-Book Reader (PDF based) and Microsoft Reader (.lit based) • .lit limited to simple stuff, and not as robust as PDF, but can’t underestimate M/soft • New versions of Adobe will have built-in DOI capability

Text reflow • Acrobat 5 introduced sructured PDF • The Holy Grail synthesis of structure and presentation • Writes a PDF file in XML(ish) • Asserts reading order • Allows for reflow into different reader devices • Works best for simple only, but good start

Conclusions • There are lots of standards out there • Some of them compete with one another • Not all of them are formal • They may change over time • Publishing industry standards are not only developed by the publishing industry • Not always easy to judge the winners

Emerging Standards for Libraries and Publishers

Emerging Standards for Libraries and Publishers

Presentation Transcript

Licenses: maximizing their benefits for authors, libraries, and publishers

Emerging Standards for Interoperable Biological Systems

“Basic Congruity between Publishers and Libraries” –

Libraries and the Utah Core Standards

Open Library towards a business model for publishers and libraries

Standards for Philippine School Libraries

HTML5 for Publishers

Serving Libraries and Publishers since 1958

STANDARDS FOR PHILIPPINE LIBRARIES Final Draft

Including Current and Emerging Standards

Emerging Grid Standards

Emerging Technologies: Virtualization in Libraries

10 Emerging Technologies for Academic Libraries

SCELC and Emerging Standards

Emerging Social Networking Technologies and Possible Implications for Libraries

Emerging Standards for Complex Works

Emerging Wireless Standards

Emerging Data Standards for Phylogenomics Research

Two’s company, three’s a crowd: Publishers, Agents and Libraries

Emerging Social Networking Technologies and Possible Implications for Libraries

Emerging Data Standards for Phylogenomics Research

A is for Acronym: Libraries and Internet Standards for Serialists