1 / 42

Emerging Standards for Libraries and Publishers

This briefing session covers identifiers, metadata, and e-books in the emerging standards for libraries and publishers. It does not cover graphics, character sets, relationship models, e-commerce, XML, usage stats standards, or rights metadata.

annjackson
Download Presentation

Emerging Standards for Libraries and Publishers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Emerging Standards for Libraries and Publishers Cliff Morgan, John Wiley & Sons Ltd UKSG briefing session, 15-17 April 2002

  2. What I’ll be covering • Identifiers • Metadata • E-books

  3. What I won’t be covering • Graphics (e.g. JPEG, GIF, PNG, SVG) • Character sets (ASCII, Unicode) • Relationship models (RDF, Topic Maps/XTM) • E-commerce (UN/EDIFACT, XML-edi, ebXML) • XML stuff (Schemas, Xlink, XSL, XSLT, etc.) • Usage stats standards (e.g. COUNTER, ANSI/NISO Z39.7-1995) • Rights metadata (XrML, ODRL)

  4. Identifiers • ISSN • ISBN • SICI • BICI • PII • DOI • ISTC • Multimedia identifiers

  5. ISBN • International Standard Book Number • ISO 2108 • e.g. 0-471-92755-4 • Geog location/language - publisher/imprint - title (print format) - check character • Has been a standard for > 30 years

  6. New ISBN • ISBN is being revised - 13 digits from 1/1/05 • Can double capacity by giving a 979 prefix • Issues: - hexadecimal or decimal? - limit ISBN to print - do something else for electronic? versions? formats? - assign to components (e.g. chaps)? - should number be completely dumb? - metadata deposit at assignment?

  7. ISSN • International Standard Serial Number • ISO 3297 • e.g. 0749-503X • If publisher has not applied for an ISSN, any 3rd party can apply for their own data management needs • Different media get different ISSNs, e.g. print ISSNis different from CD-ROM ISSN

  8. But different file formats don’t get different ISSNs, so offline is different from online, but PDF is same as HTML • If online contains only abstracts of print full text, no new ISSN for e-version • If use print and eISSNs, must change both if title changes • http://www.issn.org:8080/English/pub/getting-checking

  9. SICI • Serial Item and Contribution Identifier • ANSI/NISO Z39.56-1996 - reaffirmed • e.g.issue=0749-503X(20010115)18:1<>1.0.TX;2-XArt. = 0749-503X(20010115)18:1<1:YGPIWG>2.0.TX;2-X (Check digits in above examples have not been calculated.) • Well used at issue level - bar codes • Less used at article level

  10. SICIs at Article Level • Requires publication info - but publishers want to assign article Ids before pubn • Long-winded • Unfortunate syntax for Internet transfer (<>, #) - needs SGML entifying and hex encoding • Unclear what to do with special characters in Title Code • Not unique ID if two untitled articles on same page (e.g. Letters)

  11. C = Contribution, not Component • SICI allows identification of article, issue ToC, issue Index and article abstract (DPIs of 0, 1, 2, 3 respectively) • No way of using SICI to identify any other component (such as Figure, Table, Section) • Not surprising since it’s a canonicalisation nightmare • http://sunsite.berkeley.edu/SICI/version2.html

  12. BICI • Book Item and Component Identifier • ISO DSFTU (Draft Standard for Trial Use) • e.g. 0387119787(1982)<174:ADTATO>2.2.TX;1-Q • ISBN, date, location, title, component type, etc. • Trial was Aug 2000 to Jan 2002 - not much evidence of use • Many issues the same as for SICI, but also less business push

  13. PII • Publisher Item Identifier • Proposed in 1995 by ACS, AIP, APS, IEEE and Elsevier, but never became a standard • e.g. S0749-503X011234 • Some publishers use as internal id since doesn’t suffer from any of the SICI problems • But no registration/maintenance agency

  14. DOI • Digital Object Identifier • ANSI/NISO Z39.84-2000 • e.g. issue = 10.1002/yea.v18:1 article = 10.1002/yea.1234 • Well established in academic journals publishing - esp. ‘cos of CrossRef • 4.2 million DOIs deposited to date • http://www.doi.org

  15. Some publishing issues regarding DOIs • What are they assigned to? • Need for matching URL, so can’t assign to anything you wouldn’t give a URL to • Individual publishers need to decide their DOI structure • Doesn’t have to be human-friendly but must be unique, easily generated, and matched with URL • Application profiles for different genres

  16. Processes • Apply to Registration Agency (IDF, CDI, CrossRef, Enpia, LON) for Registrant Prefix • For individual DOIs, batch-process - generate DOIs and URLs from electronic metadata and send to RA for deposit • DOIs never change (even if journal changes ownership) but matched URLs (or other locators) can

  17. ISTC • International Standard Textual Work Code • ISO Committee Draft 21047 - circulated Oct 01, voting finished Jan 02: progressed to Enquiry stage • http://www.nlc-bnc.ca/iso/tc46sc9/21047.htm • E.g. 0A9-2002-1223F332-0 (RA+year+WorkID+check) • A Work (= abstract creation) id - replaces the ISWC(L)

  18. Creator-centric - authors may apply to ISTC Agency directly or via agents or via publisher • Requires metadata deposit too • Publishers therefore need to capture these numbers if they’ve been assigned to Works • Will authors really bother with this?

  19. A couple of non-text, non-graphic Ids you might want to know about • ISAN • ISWC

  20. ISAN • International Standard Audiovisual Number • ISO Draft International Standard 15706 • E.g. 153C-7365-B36F-844C-N • Can be issued to movies, trailers, TV programmes, episodes or series, ads, multimedia works if A/V component is significant • http://www.nlc-bnc.ca/iso/tc46sc9/isan.htm • Work has also started on a V-ISAN for Versions

  21. ISWC • International Standard Musical Work Code (used to be ISWC(T)) • ISO 15707 • e.g. T-034524680-1 • Identifies any musical work, including arrangements, movements, medleys, samples • http://www.iswc.org/iswc/iswc/en/html/home.html

  22. Metadata • Resource discovery (Dublin Core, OAI-PMH), incl. Linking (CrossRef) • Product metadata (ONIX and ONIX for Serials) • Preservation metadata (OAIS) • I am not going to talk about library-specific sets such as MARC, Z-3950, AACR2, etc.

  23. Dublin Core • Defined Universal Bibliographic Language for Internet Navigation and Coherent Online Resource Exploration [not really!] • ANSI Z-3985 • DC 1.1 (simple, unqualified set of 15 elements) • Qualified set (DCQ? dcterms?) needed to do anything more than basic - not standard yet

  24. DC has been mandated by UK Government (“e-GMS”) • Application Profiles will deal with defined local extensions via namespace declarations

  25. OAI-PMH • Open Archives Initiative Protocol for Metadata Harvesting • Not really an archive in the sense of repository, more of a political statement and a metadata harvesting protocol • Came out of the E-print community, but they welcome commercial publishers • Supported by DLF and CNI • Uses simple (unqualified) Dublin Core as its metadata • E.g. <creator>Cliff Morgan</> • Version 2 of protocol due for release June 2002 • http://www.openarchives.org

  26. CrossRef metadata set • CrossRef matches the metadata in a citation with the metadata in its Metadata Database (MDDB), which includes the DOI for the resource • Participating publishers (91 of ‘em) deposit the m/data with DOI into the MDDB • To date, 3.7M DOIs, covering 5000+ jnls • http://www.crossref.org

  27. New version • Version 2 much more complicated - full schema is 113 pages long • In addition to journals, covers books and conference proceedings, at whole title and chapter level • Some element names are different from CrossRef 1.0

  28. ONIX • OnLine Information eXchange • Latest release is 2.0 • Original focus was message format for books through the trade, but is fast becoming a universal metadata set for describing publications • http://www.editeur.org

  29. ONIX being championed by a number of publishers and online retailers • Swedish Royal Library using ONIX as an input medium

  30. ONIX for Serials • Provides rich cataloguing information for agents, librarians, users • Supports alerting, despatch and library check-in • Structured, multi-level bibliographic descriptions, including ToCs • Descriptions for library holdings (direct to OPACs)

  31. Draft 2 just released this month • Subscription Package Record provides product catalogue info about subscription packages • Serial Title Record provides catalogue info about an individual serial • Serial Item Record provides structured multi-level bibliographic description of serial parts

  32. So is the CrossRef set like the ONIX for Serials set? • No • They both include metadata that can be used to describe journals, issues and articles • But they don’t use the same element names • CrossRef has mapped to ONIX but not to ONIX for Serials yet - but has said will support when released

  33. OpenURL • NISO Work Item • Separates metadata for resource from metadata for location • Resolver services (such as SFX, CrossRef) make the context-sensitive link • Solves the “appropriate copy” problem, where more than one legit copy of an article may be available to a library, e.g. local holding, consortium, aggregator service, mirror site, publisher

  34. OpenURL metadata • OpenURL comprises BASEURL and QUERY • BASEURL identifies the resolver; QUERY is a resource description • e.g. (simplified): http://resolver.ukoln.ac.uk/genre=article &atitle=Information%20gateways:… &issn=14684527&volume=24&spage=40 &aulast=Heery&aufirst=Rachel

  35. Genres defined as “referent-types”, such as book, chapter, journal, article, conf proc and paper, dissertation, patent, report - each has its own metadata spec • High-level concept is the Bison-Futé model http://www.dlib.org/dlib/july01/vandesompel/07vandesompel.html

  36. Preservation metadata • OAIS (Open Archival Information System) underlies all digital preservation models • Nothing to do with OAI • Based on SIPs (Submission Info Packages), AIPs (Archival Info Packages) and DIPs (Dissemination Info Packages) • The Producer wraps the stuff up in a SIP, it gets ingested into an AIP, and sent out as a DIP

  37. Some other metadata activities • LOM - Learning Object Model • IMS - Instructional Management Set (builds on LOM) • PRISM - Publishing Requirements for Industry Standard Metadata • MEG - cross-sectoral Metadata for Education Group • SCORM - Shared Contents Objects Reference Model - US DoD project, also builds on IMS/LOM

  38. How are we supposed to cope with all these metadata sets? • A publisher’s metadata becomes an important asset for describing product to the outside world, esp. for trading and linking • If publishers have their publications in electronic form, the metadata will be in there in the file so it just needs extracting and mapping to whatever metadata set the publisher chooses • Production issue: who checks the metadata?

  39. E-books • OEBPS - Open E-Book Publication Structure • Three components: a) XML DTD for content b) DC-based metadata (but some non- compliant qualifier attributes) c) description of package’s structure, reading order, navigation • Many OEB files are just (a) • Version 2 being worked on, esp. M&I, and Rights

  40. Formats • Front runners are Adobe E-Book Reader (PDF based) and Microsoft Reader (.lit based) • .lit limited to simple stuff, and not as robust as PDF, but can’t underestimate M/soft • New versions of Adobe will have built-in DOI capability

  41. Text reflow • Acrobat 5 introduced sructured PDF • The Holy Grail synthesis of structure and presentation • Writes a PDF file in XML(ish) • Asserts reading order • Allows for reflow into different reader devices • Works best for simple only, but good start

  42. Conclusions • There are lots of standards out there • Some of them compete with one another • Not all of them are formal • They may change over time • Publishing industry standards are not only developed by the publishing industry • Not always easy to judge the winners

More Related