310 likes | 411 Views
Author Generated JATS XML Markup. Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com. How We Started. Co-Founded Worldwide Cars Online in 1990 Sent images of cars and car parts via Compuserve emails (modem speed 7kb/sec) No official Internet
E N D
Author Generated JATS XML Markup Andy Gajetzki CIO, ispub.com Olivier Wenker, MD, MBA Founder and CEO, ispub.com
How We Started • Co-Founded Worldwide Cars Online in 1990 • Sent images of cars and car parts via Compuserve emails (modem speed 7kb/sec) • No official Internet • Closed the company in 1994 • Created online content while at Baylor in 1994 • Netscape goes public in 1995 • Officially launched 1st online journal in 1995
How We Continued • Started with The Internet Journal of Anesthesiology • Added more journal over time • All were open access from the beginning no registration required as reader) • Some of the first articles were submitted in print via mail and I retyped them with Word • Articles were then submitted to me via email (attached as Word document)
How We Continued • Initially used a Mosaic Browser tool and then a Netscape Browser tool to create HTML for the web pages • Then used 1st version of FrontPage to create a more complex web site • We decided in 1997 to convert Word documents into SGML data sets and then to use XML in 1998
What We Are Today • We currently publish 82 titles (online medical journals) at www.ispub.com • We use our own article submission system (home-grown) at www.quickmedpub.com • We just implemented a new backend for article submissions and article flow • We decided to have authors generate much of the markup
And Now Lets Get Technical Author Generated JATS XML Markup by Andy Gajetzki
What is our JATS editor? • Represents a move to author generated markup for our XML • Based on a customizable and reusable PHP component • Symfony2 – popular PHP framework • Easy to use • Form based, WYSYWIG and linear workflow
Our old workflow • How we used to do things: • Three separate workflows for each article: • Header generation • Body markup • Conversion from proprietary XML to JATS as the last step
Problems with our current method • Time consuming • Delays in publishing • Error prone • Data entry is performed by programmers • Authors don’t like the delay to publish and the delay to correct errors
Design Rational • We can’t support the whole spec. • How did we determine what to support? • Statistical analysis of most markup in our current article corpus How can we offset as much markup to the author as possible but still have a clean and intelligible end product?
What is supported • NLM Blue 3.0 • Two separate support levels • Inline-level • Block-level • Our level of JATS support is determined by each level.
Inline Level • Italics, bold, and all other presentation layermarkup supported
Block level • Single level sections only as WYSIWYG editor is based on the HTML DOM • Other tools providing a more XML approach are expensive, and more difficult for the author to use • General structure is <sec> <title> <xyz> • <Sec> • > Boxed-text, fig, graphic, preformat, table-wrap, p, list
Titles • Support of presentational elements with, for the most part, a non-mixed content-type
Contributors • Flexible • Single / collaborative authors • Most JATS <contrib-group>markup supported • Inline-level formatting in block elements
Keywords • Keywords should be based on MeSH entries • Validation constraints canbe applied based on that
Other article-meta • Article ID’s • Author notes • Supplemental content • Funding/grants • Article history • Permissions
Abstract / Body / Appendices • Currently a moving target • MathML is not currently supported • Current subset of JATS covers 99% of our cases, but we will always try to expand coverage
WYSIWYG HTML Editor • Utilize a specific subset of HTML that we can unambiguously map to JATS via data transformations • XSLT • regexp • If no mapping is possible, another method must be devised
Images / Table Capture / Media • Images / Figures are handled via out-of-band file upload on a separate page • Authors are requested to upload highest quality format that they can • Tables can either be captured as an image, or inserted via a Word style table creation tool • Other media types have not been implemented yet
Endnote Handling – Document references • JavaScript annotation tool • Endnote number / reference is highlighted in the text and a resolution is made to a back-matter citation entry
Supported Back Matter • Acknowledgments • Appendices • Biography • Glossary’s • Citations • Notes • Content-type attribute of note element supported
Citation Handling – Back matter • One citation per line • Regular expression search for meta-data service identifiers at PMC and Crossref • If a match is found, correct metadata is pulled from the service • Simple JavaScript annotation tool to tokenize citation string • Before submission, author must resolve all endnote problems
From browser to JATS XML • The block level components operate on the HTML DOM • CSS classes are added to elements to distinguish content types • Through various transformations, we interpret the resultant DOM and produce the JATS XML HTML mapping JATS XML
Validation • When things go wrong 1) XSD Validation • Intervention required by staff 2) Style/presentation problems • Intervention required by author/staff 3) Copy editing 4) Peer review
Amazon Mechanical Turk • For predictable failures, Amazon Mechanical Turk, a platform for “human intelligence tasks”, can be used • For a small price, work units are created and human workers get paid to perform the task • 24x7 availability
Contact For Questions Technical questions: Andy Gajetzki andy@ispub.com General questions: Olivier Wenker, MD, MBA wenker@ispub.com