420 likes | 429 Views
This briefing session covers identifiers, metadata, and e-books in the emerging standards for libraries and publishers. It does not cover graphics, character sets, relationship models, e-commerce, XML, usage stats standards, or rights metadata.
E N D
Emerging Standards for Libraries and Publishers Cliff Morgan, John Wiley & Sons Ltd UKSG briefing session, 15-17 April 2002
What I’ll be covering • Identifiers • Metadata • E-books
What I won’t be covering • Graphics (e.g. JPEG, GIF, PNG, SVG) • Character sets (ASCII, Unicode) • Relationship models (RDF, Topic Maps/XTM) • E-commerce (UN/EDIFACT, XML-edi, ebXML) • XML stuff (Schemas, Xlink, XSL, XSLT, etc.) • Usage stats standards (e.g. COUNTER, ANSI/NISO Z39.7-1995) • Rights metadata (XrML, ODRL)
Identifiers • ISSN • ISBN • SICI • BICI • PII • DOI • ISTC • Multimedia identifiers
ISBN • International Standard Book Number • ISO 2108 • e.g. 0-471-92755-4 • Geog location/language - publisher/imprint - title (print format) - check character • Has been a standard for > 30 years
New ISBN • ISBN is being revised - 13 digits from 1/1/05 • Can double capacity by giving a 979 prefix • Issues: - hexadecimal or decimal? - limit ISBN to print - do something else for electronic? versions? formats? - assign to components (e.g. chaps)? - should number be completely dumb? - metadata deposit at assignment?
ISSN • International Standard Serial Number • ISO 3297 • e.g. 0749-503X • If publisher has not applied for an ISSN, any 3rd party can apply for their own data management needs • Different media get different ISSNs, e.g. print ISSNis different from CD-ROM ISSN
But different file formats don’t get different ISSNs, so offline is different from online, but PDF is same as HTML • If online contains only abstracts of print full text, no new ISSN for e-version • If use print and eISSNs, must change both if title changes • http://www.issn.org:8080/English/pub/getting-checking
SICI • Serial Item and Contribution Identifier • ANSI/NISO Z39.56-1996 - reaffirmed • e.g.issue=0749-503X(20010115)18:1<>1.0.TX;2-XArt. = 0749-503X(20010115)18:1<1:YGPIWG>2.0.TX;2-X (Check digits in above examples have not been calculated.) • Well used at issue level - bar codes • Less used at article level
SICIs at Article Level • Requires publication info - but publishers want to assign article Ids before pubn • Long-winded • Unfortunate syntax for Internet transfer (<>, #) - needs SGML entifying and hex encoding • Unclear what to do with special characters in Title Code • Not unique ID if two untitled articles on same page (e.g. Letters)
C = Contribution, not Component • SICI allows identification of article, issue ToC, issue Index and article abstract (DPIs of 0, 1, 2, 3 respectively) • No way of using SICI to identify any other component (such as Figure, Table, Section) • Not surprising since it’s a canonicalisation nightmare • http://sunsite.berkeley.edu/SICI/version2.html
BICI • Book Item and Component Identifier • ISO DSFTU (Draft Standard for Trial Use) • e.g. 0387119787(1982)<174:ADTATO>2.2.TX;1-Q • ISBN, date, location, title, component type, etc. • Trial was Aug 2000 to Jan 2002 - not much evidence of use • Many issues the same as for SICI, but also less business push
PII • Publisher Item Identifier • Proposed in 1995 by ACS, AIP, APS, IEEE and Elsevier, but never became a standard • e.g. S0749-503X011234 • Some publishers use as internal id since doesn’t suffer from any of the SICI problems • But no registration/maintenance agency
DOI • Digital Object Identifier • ANSI/NISO Z39.84-2000 • e.g. issue = 10.1002/yea.v18:1 article = 10.1002/yea.1234 • Well established in academic journals publishing - esp. ‘cos of CrossRef • 4.2 million DOIs deposited to date • http://www.doi.org
Some publishing issues regarding DOIs • What are they assigned to? • Need for matching URL, so can’t assign to anything you wouldn’t give a URL to • Individual publishers need to decide their DOI structure • Doesn’t have to be human-friendly but must be unique, easily generated, and matched with URL • Application profiles for different genres
Processes • Apply to Registration Agency (IDF, CDI, CrossRef, Enpia, LON) for Registrant Prefix • For individual DOIs, batch-process - generate DOIs and URLs from electronic metadata and send to RA for deposit • DOIs never change (even if journal changes ownership) but matched URLs (or other locators) can
ISTC • International Standard Textual Work Code • ISO Committee Draft 21047 - circulated Oct 01, voting finished Jan 02: progressed to Enquiry stage • http://www.nlc-bnc.ca/iso/tc46sc9/21047.htm • E.g. 0A9-2002-1223F332-0 (RA+year+WorkID+check) • A Work (= abstract creation) id - replaces the ISWC(L)
Creator-centric - authors may apply to ISTC Agency directly or via agents or via publisher • Requires metadata deposit too • Publishers therefore need to capture these numbers if they’ve been assigned to Works • Will authors really bother with this?
A couple of non-text, non-graphic Ids you might want to know about • ISAN • ISWC
ISAN • International Standard Audiovisual Number • ISO Draft International Standard 15706 • E.g. 153C-7365-B36F-844C-N • Can be issued to movies, trailers, TV programmes, episodes or series, ads, multimedia works if A/V component is significant • http://www.nlc-bnc.ca/iso/tc46sc9/isan.htm • Work has also started on a V-ISAN for Versions
ISWC • International Standard Musical Work Code (used to be ISWC(T)) • ISO 15707 • e.g. T-034524680-1 • Identifies any musical work, including arrangements, movements, medleys, samples • http://www.iswc.org/iswc/iswc/en/html/home.html
Metadata • Resource discovery (Dublin Core, OAI-PMH), incl. Linking (CrossRef) • Product metadata (ONIX and ONIX for Serials) • Preservation metadata (OAIS) • I am not going to talk about library-specific sets such as MARC, Z-3950, AACR2, etc.
Dublin Core • Defined Universal Bibliographic Language for Internet Navigation and Coherent Online Resource Exploration [not really!] • ANSI Z-3985 • DC 1.1 (simple, unqualified set of 15 elements) • Qualified set (DCQ? dcterms?) needed to do anything more than basic - not standard yet
DC has been mandated by UK Government (“e-GMS”) • Application Profiles will deal with defined local extensions via namespace declarations
OAI-PMH • Open Archives Initiative Protocol for Metadata Harvesting • Not really an archive in the sense of repository, more of a political statement and a metadata harvesting protocol • Came out of the E-print community, but they welcome commercial publishers • Supported by DLF and CNI • Uses simple (unqualified) Dublin Core as its metadata • E.g. <creator>Cliff Morgan</> • Version 2 of protocol due for release June 2002 • http://www.openarchives.org
CrossRef metadata set • CrossRef matches the metadata in a citation with the metadata in its Metadata Database (MDDB), which includes the DOI for the resource • Participating publishers (91 of ‘em) deposit the m/data with DOI into the MDDB • To date, 3.7M DOIs, covering 5000+ jnls • http://www.crossref.org
New version • Version 2 much more complicated - full schema is 113 pages long • In addition to journals, covers books and conference proceedings, at whole title and chapter level • Some element names are different from CrossRef 1.0
ONIX • OnLine Information eXchange • Latest release is 2.0 • Original focus was message format for books through the trade, but is fast becoming a universal metadata set for describing publications • http://www.editeur.org
ONIX being championed by a number of publishers and online retailers • Swedish Royal Library using ONIX as an input medium
ONIX for Serials • Provides rich cataloguing information for agents, librarians, users • Supports alerting, despatch and library check-in • Structured, multi-level bibliographic descriptions, including ToCs • Descriptions for library holdings (direct to OPACs)
Draft 2 just released this month • Subscription Package Record provides product catalogue info about subscription packages • Serial Title Record provides catalogue info about an individual serial • Serial Item Record provides structured multi-level bibliographic description of serial parts
So is the CrossRef set like the ONIX for Serials set? • No • They both include metadata that can be used to describe journals, issues and articles • But they don’t use the same element names • CrossRef has mapped to ONIX but not to ONIX for Serials yet - but has said will support when released
OpenURL • NISO Work Item • Separates metadata for resource from metadata for location • Resolver services (such as SFX, CrossRef) make the context-sensitive link • Solves the “appropriate copy” problem, where more than one legit copy of an article may be available to a library, e.g. local holding, consortium, aggregator service, mirror site, publisher
OpenURL metadata • OpenURL comprises BASEURL and QUERY • BASEURL identifies the resolver; QUERY is a resource description • e.g. (simplified): http://resolver.ukoln.ac.uk/genre=article &atitle=Information%20gateways:… &issn=14684527&volume=24&spage=40 &aulast=Heery&aufirst=Rachel
Genres defined as “referent-types”, such as book, chapter, journal, article, conf proc and paper, dissertation, patent, report - each has its own metadata spec • High-level concept is the Bison-Futé model http://www.dlib.org/dlib/july01/vandesompel/07vandesompel.html
Preservation metadata • OAIS (Open Archival Information System) underlies all digital preservation models • Nothing to do with OAI • Based on SIPs (Submission Info Packages), AIPs (Archival Info Packages) and DIPs (Dissemination Info Packages) • The Producer wraps the stuff up in a SIP, it gets ingested into an AIP, and sent out as a DIP
Some other metadata activities • LOM - Learning Object Model • IMS - Instructional Management Set (builds on LOM) • PRISM - Publishing Requirements for Industry Standard Metadata • MEG - cross-sectoral Metadata for Education Group • SCORM - Shared Contents Objects Reference Model - US DoD project, also builds on IMS/LOM
How are we supposed to cope with all these metadata sets? • A publisher’s metadata becomes an important asset for describing product to the outside world, esp. for trading and linking • If publishers have their publications in electronic form, the metadata will be in there in the file so it just needs extracting and mapping to whatever metadata set the publisher chooses • Production issue: who checks the metadata?
E-books • OEBPS - Open E-Book Publication Structure • Three components: a) XML DTD for content b) DC-based metadata (but some non- compliant qualifier attributes) c) description of package’s structure, reading order, navigation • Many OEB files are just (a) • Version 2 being worked on, esp. M&I, and Rights
Formats • Front runners are Adobe E-Book Reader (PDF based) and Microsoft Reader (.lit based) • .lit limited to simple stuff, and not as robust as PDF, but can’t underestimate M/soft • New versions of Adobe will have built-in DOI capability
Text reflow • Acrobat 5 introduced sructured PDF • The Holy Grail synthesis of structure and presentation • Writes a PDF file in XML(ish) • Asserts reading order • Allows for reflow into different reader devices • Works best for simple only, but good start
Conclusions • There are lots of standards out there • Some of them compete with one another • Not all of them are formal • They may change over time • Publishing industry standards are not only developed by the publishing industry • Not always easy to judge the winners