390 likes | 540 Views
Implementation of TaxPub, a JATS extension for domain-specific markup in taxonomy: the experience of a biodiversity publisher. Lyubomir Penev , Terry Catapano , Donat Agosti , Teodor Georgiev , Guido Sautter, Pavel Stoev JATS-Con, 16 - 17 Oct 201 2. Plazi.
E N D
Implementation of TaxPub, a JATS extension for domain-specific markup in taxonomy:the experience of a biodiversity publisher Lyubomir Penev, Terry Catapano, DonatAgosti, Teodor Georgiev, Guido Sautter, PavelStoev JATS-Con,16 - 17 Oct 2012 Plazi
This presentation wll focus on: Implementation of TaxPub, an extension to the general NLM JATS DTD for taxonomy publishing Semantic tagging of and enhancements to published texts Dissemination of published information to aggregators Current and future development of TaxPub
Plazifounded in 2008: Swiss based NGO with members in Switzerland, Germany, US and Iran • Plazi is a research based think tank with the mission to promote the idea of open access to scientific content • Plazi has four pillars: Legal advice, technical solutions (eg TaxPub), maintenance of a treatment repository, advocacy • Plazi GmbH founded in 2012 as service SME owned by Plazi to provide document conversion services and consultation • Funding from public donors, eg. EU, and private • Clients are global Quick facts about Plazi
Conservation: Global biodiversity crisis. Increasing loss of species, but no tools to measure and document it • Science: ca 1.8M species described, ca 8M expected • Scientific publications • ca 17,000 species described per annum; ca 100,000 redescriptions per annum -> rich content • highly fragmented with over 2,500 journals and books involved -> difficult access • Solution: Open Access and semantically enhanced publications allow immediate registration of new taxa and dissemination of content -> Taxpub JATS/DTD Context
This presentation wll focus on: • Implementation of TaxPub, an extension to the general NLM JATS DTD for taxonomy publishing • Semantic tagging of and enhacements to published texts • Dissemination of published information to aggregators • Current and future development of TaxPub
TaxPub • Lightweight extension of Blue DTD • Describe at JATS-Con 2010: “TaxPub: An Extension of the NLM/NCBI Journal Publishing DTD for Taxonomic Descriptions” (http://www.ncbi.nlm.nih.gov/books/NBK47081/) • Treatments (i.e., species descriptions) • <tp:taxon-treatment>, <tp:nomenclature>, <tp:treatment-sec> • Domain specific content • <taxon-name>: Taxonomic names • <materials-citation>references to specimens • <descriptive-statement>: descriptions of morphological features
<tp:taxon-treatment> <tp:nomenclature> <tp:taxon-name> <tp:taxon-name-part taxon-name-part-type="genus">Platyscelio</tp:taxon-name-part> <tp:taxon-name-part taxon-name-part-type="species">mzantsi</tp:taxon-name-part> <object-id>urn:lsid:zoobank.org:act:D084EF48-4736-444F-916F-2C8CDE23E29B</object-id> <object-id>urn:lsid:biosci.ohio-state.edu:osuc_concepts:242617</object-id> </tp:taxon-name> <tp:taxon-authority>Taekul & Johnson</tp:taxon-authority> <tp:taxon-status>sp. n.</tp:taxon-status> </tp:nomenclature> <tp:treatment-sec sec-type=”materials_examined”> ...
<tp:treatment-sec sec-type="materials_examined"> <p> <tp:material-citation> <tp:type-status>Holotype</tp:type-status> worker. <tp:taxon-type-location>King Saud Museum of Arthropods (KSMA), College of Food and Agriculture Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia.</tp:taxon-type-location> <tp:collecting-event> <tp:collecting-location>SAUDI ARABIA, Al Bahah province, Amadanforest, Al Mandaq governorate, </tp:collecting-location> <named-content content-type="dwc:verbatimCoordinates">20°12'N, 41°13'E</named-content> , 1881 m.a.s.l. 19.V.2010 (M. R. Sharaf & A. S. Aldawood Leg.); </tp:collecting-event> </tp:material-citation> </p> </tp:treatment-sec>
TaxPub: Recent and Future Developments • Largely stable • <x> • Greenfication • Interest from journals: • European Journal of Taxonomy • Zootaxa (via EOL) • Markup of morphological descriptions
<p>Spreading shrub; stems erect,<Categorical uri="http://ontology.org/plant/stem-color"> <State uri="http://ontology.org/plant/greenish">greenish</State> </Categorical>. Leaves deciduous early in summer (particularly when infected with Diseasomyces), oblong, apex obtuse, glabrous or weakly hirsute; stipules sharply pointed, <Quantitative uri="http://ontology.org/plant/stipule-width"><value value="3.2">3,2mm</value></Quantitative> wide, <Categorical uri="http://ontology.org/plant/stipule-color"> <State uri="http://ontology.org/plant/black">black</State> or <State uri="http://ontology.org/plant/brown">darkish brown,</State></Categorical>extremely rarely yellow, often shallowly joined around the node; spines stout.</p>
TaxPub: Challenges • Maintenance • Sourceforge • Volunteer effort, little time, no funding… • Supported by Plazi • Documentation • Comments with ad hoc markup in extension files • Converted to HTML by NCBI Tool • Maintained at Species-ID wiki
Pensoft founded in 1992: more than 700 books published; two offices in Sofia and Moscow; 16employees • ZooKeys launched in July 2008 as the first mandatoryOpen Access journal in taxonomy; 205 issues, 20,000 pages IN FOUR YEARS • All new taxaregistered in ZooBankand supplied to EOL, Plazi and the wiki Species-ID • CrossRef member, ISI and Scopuscovered, indexed in Zoological Record, DOAJ, CABI Abstracts, Google Scholar; archived inPubMedCentraland CLOCKSS • Pensoft Journal System – XML-based online editorial system; publishing services offered to society and institutional journals Quick facts about Pensoft & ZooKeys
The XML landscape for legacy and prospective taxonomic literature PROSPECTIVE PUBLISHING | HISTORICAL LITERATURE TaxonX , taXMLit schemas PLAZI’ GOLDEN GATE editor Content management systems & repositories (e.g., EOL, GBIF, SCRATCHPADS) TaxPub XML schemaPENSOFT MARK UP tool Automated submission; peer-review Marked up publicationsPDF, HTML and XML Unified marked up final outputTaxon treatments, keys, images, localities archiving END USERS WIKI Species-ID Wikispecies Wikipedia Aggregators(EOL, GBIF) Electronic archives; Data Centers Indexing (IPNI, ZooBank, Myco- Bank, GNA)
Four stages of the XML-based editorial workflow SUBMISSION: XML-tagged or non-tagged manuscripts? PEER-REVIEW/EDITORIAL PROCESS: The technical challenges of the XML mark up PUBLICATION: Differentpublishing formats and to whom they are addressed? DISSEMINATION: How to provide a maximum distribution of published information
Nomenclature Literature Descriptions Images Occurrences But why to mark up? Is it really needed? Who will be using it? Plazi
Automated export of species descriptions to Encyclopedia of Life (EOL) XML MARK UP
Automated harvesting and deposition of taxon treatments in Plazi
More semantic Web Enhancements! Pensoft Writing Tool (PWT) – a collaborative article writing platform Community-based and open peer review process Biodiversity Data Journal will publish any kind of “small data”: checklists, nomenclatural acts, taxon treatments The Future of TaxPub and its implementations
RE-USE of CONTENT Publishing and sharing of primary data Drawings: SlavenaPeneva Primary data
Biodiversity Data Journal All data maters: Nolower or upper limit of manuscript size! ALLwithin a single online collaborative platform, including the writing of the manuscript! Collaborative article authoring tool Community peer review with “open” and “public” options, on the top of conventional peer-review Online editorial process and version control Standard-compliant (Darwin Core, Dublin Core, NLM JATS, etc.) Pre-defined biological Code-compliant article templates
Any other data Genome data Occurrence data Life cycle of data published in the BDJ Biodiversity manuscript Phylogenetic data Morphometric data Image galleries Environmental data XML MARK UP Structured text (data!) Taxon names Taxon treatments Occurr-ence data ARTICLES Biblio-graphies COL Plazi Wiki BHL
The main difficulties are caused by: The specificity of the domain (e.g., taxon names, synonyms, instability of nomenclature, lack of global LSID infrastructure, etc.) Mark up of occurrence data (certainly a great challenge) Cost efficiency of markup process Sociological barriers: the majority of authors are not willing to change their writing habits; most are still not aware about the tremendous advantages of the Web 2.0 technologies Most small taxonomy publishers (and some bigger ones) have no experience in XML-based editorial wokflows or they simply can’t afford it The lessons learned
“ Semi-automatically generated semantic, enhanced e-publications are the only way to describe the missing 10 M species, and to deal with an increasing flood of data.”DonatAgosti It is not easy, but...... ... it is exciting ... .... however possible only through Open Access!