260 likes | 460 Views
HarperCollins. Agenda. Content Creation Process What is DITA? What is DITA Open Toolkit? What does RSuite do? Demo Manuscript to ICML: Word -> DITA -> ICML Workflow Engine InDesign Code – Java, XSLT, XQuery, Java APIs “Groundbreaking” Topic. Current Book Composition Process.
E N D
Agenda • Content Creation Process • What is DITA? • What is DITA Open Toolkit? • What does RSuite do? • Demo • Manuscript to ICML: Word -> DITA -> ICML • Workflow Engine • InDesign • Code – Java, XSLT, XQuery, Java APIs • “Groundbreaking” Topic
Current Book Composition Process Step 1: Editorial Manuscript docx Step 2: Composition InDesign indd
New Book Composition Process Step 1: Editorial Manuscript docx Step 2 Generate DITA XML xml Step 3 Generate ICML icml Step 4: Composition InDesign indd Download ICML Transform 1 Transform 2
What is DITA? • Darwin Information Typing Architecture • Is an XML Data Model for Authoring and Publishing • Topic Oriented • Each Topic is a separate XML file • DocBook is Book Oriented, more Complex, One Big XML file • DITA Initial Spec in 2001 • DocBook Initial Spec in 1991 • Core DITA Topic Types: • Concept • Task • Reference • Specialization: Subtyping – New Topics derived from existing
What is DITA? • Topic must have at least: Id attribute in root, title, and body. • DITA MAP stitches topics together.
What is DITA? Eliot Kimber • http://www.ditausers.org/tutorials/basics/kimber/ • http://www.xiruss.org/tutorials/dita-specialization/ Norm Walsh Post from October 2005: • http://norman.walsh.name/2005/10/21/dita Four key technical differences where DITA may be “better” than DocBook: • A topic-oriented authoring paradigm. • A cross-referencing scheme that's more practical than XML's flat ID space. • SGML's conref, reinvented. • An extensibility model based on "specialization".
What is DITA Open Toolkit? • Open-source publishing system for DITA • Provides multi-channel output • https://github.com/dita-ot/dita-ot/ • https://dita-ot.github.io/ • Uses Pipeline Processing Approach using: • Java • XSLT • Rendering Engine (FOP, RTF, etc.) • DITA 4 Publishers
What does RSuite do? • Centralized Repository for “all” artifacts • Provides: • Workflow • DITA Transforms • Manuscript to DITA • DITA to ICML • Multi-channel Output – PDF, ePub3, InDesign • Role Based Security • Distribution: • FTP to Commercial Printer • E-Commerce Sites
SAN Drives 500 GB – 100 GB / Disk RSuite Tomcat Server Non XML Disk 1 Temp Directories 1. XSLT Transforms 2. File Uploads Non XML Disk 2 Non XML Disk 3 MySQL Disk Non XML Disk 4 Non XML Disk 5 MarkLogic Node 2 4 CPU - 2 Core / CPU MarkLogic Node 3 4 CPU - 2 Core / CPU MarkLogic Node 1 4 CPU - 2 Core / CPU Disk 1 - Forest 5 600 GB Disk 1 - Forest 3 600 GB Disk 1 - Forest 1 600 GB Disk 2 - Forest 4 600 GB Disk 2 - Forest 2 600 GB Disk 2 - Forest 6 600 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB
SAN Drives 500 GB – 100 GB / Disk Feature Request: Use XA Transaction: File Copy MySQL Update Metadata Update RSuite Tomcat Server 1 Non XML Disk 1 Temp Directories 1. XSLT Transforms 2. File Uploads Non XML Disk 2 Non XML Disk 3 MySQL Disk 2 Non XML Disk 4 Non XML Disk 5 MarkLogic Node 1 4 CPU - 2 Core / CPU MarkLogic Node 3 4 CPU - 2 Core / CPU MarkLogic Node 2 4 CPU - 2 Core / CPU 3 Disk 1 - Forest 3 600 GB Disk 1 - Forest 5 600 GB Disk 1 - Forest 1 600 GB Disk 2 - Forest 4 600 GB Disk 2 - Forest 2 600 GB Disk 2 - Forest 6 600 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB
RSuite Demo? • Upload • Transforms • PDF, ePub • ICML to InDesign • MarkLogicConfig
Code? • Java • jBPM – Biz Process Management Framework • Ivy – to manage plugin dependencies • Ember.js • XQuery • Groovy • DITA-OT XSLT • Plugins • RSuite API Docs
Groundbreaking Opportunity • Unleash the Tombstones! • All Content can be reused for product development
DITA to RDF Transform! • Semantically Linked DITA • Link to Internal and External Content • DBPedia: http://wiki.dbpedia.org/Downloads39 • NY Times • Dublin Core • US Census • http://dbpedia.org/page/Mark_Twain • Semantic Links create a network of Knowledge • Enables Inferencing (ML8) • Uses MarkLogic Triple Index
Why RDF? • RDF compliments DITA • Contains facts about DITA topics • Facts are stored in the Triple Index • Facts are used to: • Link internal and external documents • Derive other facts (inferencing) • Provide higher quality search result • RDF is efficient storage and linking mechanism • MarkLogic turns RDF into Triples
Why Triples? Triple is a Subject-Predicate-Object (SPO) structure used to represent a fact. Lets computers derive facts from other facts without human involvement. Example: • Ted lives in Chicago, Illinois • Ted lives near Wrigley Field • Ted has a roommate called Sam • Ted and Sam go to Wrigley Field to watch games From these facts: • Sam lives in Chicago • Wrigley Field is in Chicago, Illinois • Chicago is in Illinois • Sam and Ted both live in the US • Etc…
How to add Triples? • Facts need to be curated. • Data provenance • Editors can add facts to DITA Topic Docs. • New world of Semantic Publishing • EroniKumana
Profiles in Courage Example • Add Facts to Chapter 4 DITA XML: • “Profiles in Courage” Primary ISBN value is 0060854936 • John F. Kennedy is the Author Of “Profiles in Courage” • John F. Kennedy is a Person • John F. Kennedy was at the Solomon Islands in August 1943 • EroniKumana is a Person • EroniKumana was at the Solomon Islands in August 1943 • EroniKumanarescued John F. Kennedy • EroniKumanais mentioned in Chapter 4, Profiles in Courage • Semantic Event – NY Times News Feed • EroniKumanadied on August 2, 2014 • Event Triggers Automatic Pub: • CMS automatically publishes “Profiles in Courage” web page with snippet to the specific Chapter referencing EroniKumana. • New web page also has link to like and/or purchase book.
Book Process Steps Step 1: Editorial Manuscript docx Step 2 Generate DITA XML xml Step 3 Generate ICML icml Step 4: Composition InDesign indd Download ICML Transform 1 Transform 2 Word 2 DITA DITA 2 ICML Step 3 Generate Transient RDF rdf ML Triple Index Transform 3 DITA 2 RDF
SAN Drives 500 GB – 100 GB / Disk RSuite Tomcat Server Non XML Disk 1 Temp Directories 1. XSLT Transforms 2. File Uploads Non XML Disk 2 Non XML Disk 3 MySQL Disk Non XML Disk 4 Non XML Disk 5 MarkLogic Node 1 4 CPU - 2 Core / CPU MarkLogic Node 2 4 CPU - 2 Core / CPU MarkLogic Node 3 4 CPU - 2 Core / CPU Index Index Index Disk 1 - Forest 1 600 GB Disk 1 - Forest 3 600 GB Disk 1 - Forest 5 600 GB Triples Triples Triples Disk 2 - Forest 2 600 GB Disk 2 - Forest 4 600 GB Disk 2 - Forest 6 600 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB
De-Silo-ize Custom APIs are used to communicate between silos. DAM Web Host Provider ISBN DB ebook Store Published Docs CMS
Hub Spoke – No Silos Uses standardized RDF “connectors” to communicate. DAM ISBN DB Web Host Provider ebook Store Published Docs CMS
Call To Action • Contribute to DITA RDF Project https://github.com/ColinMaudry/dita-rdf/blob/master/README.md • Build a Knowledge Engine