1 / 21

The Evolving Information Ecosystem of Publishing

The Evolving Information Ecosystem of Publishing. Evan Owens Chief Information Officer, Publishing American Institute of Physics JATS Conference 2 November 2010. This Presentation. The Past & Present Standards The Future New Challenges. The World View in the 1990s.

pembroke
Download Presentation

The Evolving Information Ecosystem of Publishing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Evolving Information Ecosystem of Publishing Evan Owens Chief Information Officer, Publishing American Institute of Physics JATS Conference 2 November 2010

  2. This Presentation The Past & Present Standards The Future New Challenges

  3. The World View in the 1990s • How to prepare for the electronic publishing future: • Create a version of record in SGML full text • Make the perfect master file • Prepare to publish simultaneously to print and online • Multiple outputs was the perceived benefit of SGML • How did you make that happen? • Write your own DTD • Work with your vendors • Set up SGML-based production processes A very document-centric view But what place did standards have in this picture?

  4. Journal Article Standards A much cited paper on the history of journal standards: A Decade of DTDs and SGML in Scholarly Publishing: What Have We learned? Bruce Rosenblum and Irina Golfman, Extreme Markup Languages 2002 “The AAP and 12083 DTDs were important projects. They laid the structural foundations for subsequent DTDs used in journal publishing. They did not succeed, however, in their goal of becoming industry-standard DTDs. This goal was not reached because, while these DTDs were generalized for the needs of the industry, they did not meet the specific business requirements of individual organizations within the scholarly publishing community.” • AAP Serial DTD (Z39.59, 1983 to 1987) • ISO 12083 (ANSI 1988, ISO 1993; last updated 1995) • NLM Tag Suite (v1 2003…v3 2010) • NISO JATS (in progress)

  5. Standards are Great: Everyone Should Have One!

  6. Standards • Role of standards • Codify existing practices • Enable new practices or technologies • Success of standards • Technical value • Business / political • Must meet real biz needs • Costs must align with benefits • Conventional wisdom in the 90s: • SGML succeeds best in highly concentrated industries with strong exchange requirements; e.g., aviation, auto, defense • Scholarly Publishing was a highly fragmented industry

  7. What has Changed in the Ecosystem? • Rise of aggregations • Move away from proprietary delivery platforms • Publishers now managing current and back content • Early online, current online, digitized back file • Exchange of data has changed business needs • CrossRef for metadata • Multiple hosting, preservation for full text • Text mining will drive future • Enormous amounts of content flowing around • Every publishing deal now includes “and also send to X, Y, Z” Business conditions are now ripe for standardization

  8. Early Adopters Typesetting service providers saw the need for standards well before their customers: • Vendor A (1990s) produced content in their internal house DTD then exported to the customer DTD • Vendor B (various) produced content in the Elsevier DTD because they could, then exported to the customer DTD • Vendor C (2010) would rather produce content in NLM then export to the customer’s DTD • Vendor A (an early adopter) produced all content in SGML/XML workflows and just discarded it if the customer wanted only the PDF returned

  9. Why Adopt NLM / JATS Now? Preaching to the choir . . . • Delivery platform requirement • Business need for compatibility • Leverage the experience in the design • Concentrate on your specific customizations • Rather than reinventing the wheel • Good documentation University of Chicago Press moved to NLM when it moved to a shared delivery platform AIP will moved to JATS in 2011

  10. Where are We Now? • Is the battle over? • Every problem solved? • Just implement NLM / JATS and all your publishing problems will be solved? We may have won this battle, but the real challenges of truly digital publishing are just starting to appear. For the first decade, online journal publishing was like old wine in new bottles; now we are seeing real innovations.

  11. SIDEBAR: Books versus Journals • Strong metadata exchange needs (e.g. Amazon) • Strong standards and groups • Came later to online and electronic publishing • E-Book readers are intrinsically different: • External to publisher’s platform • Forces standards conformance • EPUB standard • Focus was packaging rather than text structuring • But is evolving quickly A different ecosystem, but the boundaries are beginning to blur Perhaps we (books and journals) will meet in the middle?

  12. The Future

  13. Current and Future Trends in Journal Publishing • Articles, not issues • Rapid publication with limited prepress • Multimedia and “supplemental” stuff • Multiple “manifestations” and “expressions” • HTML, PDF, app, reader • Article, Podcast • Revisions (?) • Comments, annotations, blogs • Magazine-like features • Semantics, text mining • Information, not articles

  14. Ecosystem: The XML Instance We have come a long way! • Mechanics are easier • Unicode, MathML, table models, etc. • Managing the structure of the content • Much of this conference • XML Versioning Workshop at Balisage 2008 • Managing the instances • Version & validation checking But the journal publishing world is becoming less static, less document-centric . . . and a lot more complicated!

  15. Ecosystem: Content and Metadata • The XML instance as pseudo-database: <article copyeditor=“XYZ” maildate=“00/00/00”> • What metadata goes inside and what lives outside? • Descriptive (bibliographic) • Provenance (process history) • Structural (components) • Technical (formats, versions) • Is the XML instance just a piece of a larger system? • How does it fit into a larger information architecture? • Is the XML instance where this information should live? • An implementation / design decision

  16. Ecosystem: Reference Linking • Connecting XML documents to external resources • Do we rewrite the XML or externalize the links? • An implementation question only? • ApJ, NASA ADS, bibcodes • Linking identifiers that could be pre-calculated • Resolution could be added afterwards • CrossRef and DOI linking • Backfill problem: early or late binding • Dynamic resolution solutions ; e.g., Elsevier, AIP • Externalizes big parts of the document

  17. Ecosystem: Semantic Enrichment • An old-school example: updating classification schemes • Do you update the instances retroactively? • Some approaches to semantic enrichment: • Known entity identification • Generic entity extraction • Resolution/identification done later • Inline markup; e.g., <named-content> • Entities are known in advance • Completely externalized solutions • In a separate delivery system or repository • In a search engine or XML database, not in the content

  18. Ecosystem: Identity Management • ORCID (Open Research Contributor ID) • Logistical issues: • Known in advance or applied retroactively? • Future publications and/or historical? • Store in article instances or an external layer? • Larger identity management issues: • Bibliographic identity • Business identity (author, reviewer, subscriber, etc.) • Community identity (ORCID, social networking, etc.) • Another potential use of layered information architectures • Feels like an RDF kind of problem!

  19. Some Things to Think About • Content management strategy • Standards, standards, standards • Versioning, formats, validation, necessary metadata • Information lifecycle should inform everything • Not just publish once and we’re done • Formats change, needs change, even content changes • Content is going to come at us from many directions • User-contributed, not just the formal publishing process • Information architecture strategy • Think beyond just fixed documents • Plan for interactions with external systems

  20. NLM’s Contribution to Our Industry

  21. Questions? Comments? Evan Owens Chief Information Officer, Publishing American Institute of Physics eowens@aip.org

More Related