240 likes | 428 Views
Inline Markup in XLIFF 2.0. Fredrik Estreen - Lionbridge Yves Savourel - ENLASO. Disclaimer. While we believe the information presented here is pretty stable, but it only reflects the general consensus of the sub-committee working on the inline markup.
E N D
Inline Markup in XLIFF 2.0 Fredrik Estreen - Lionbridge Yves Savourel - ENLASO
Disclaimer While we believe the information presented here is pretty stable, but it only reflects the general consensus of the sub-committee working on the inline markup. Things may change during the formal approval by the sub-committee and later when it goes through the process of review and approval from the main XLIFF TC.
Agenda • Principles and Background • Inline Markup • Characters that are invalid in XML • Native Codes • Annotations • Extensions • Processing requirements • XLIFF Toolkit
Some Principles Some of the guidelines we are trying to follow during the work: • Try to have only one way to do one thing • Provide processing requirements • Try to re-use existing standards when possible • Try to keep things simple
Containing Structure The structural part of XLIFF changes in 2.0 and the inline markup should be easy to handle in the new model. • Static structure • <file> -> <group>* -> <unit> • Contents of the concatenated <source> elements remain static during processing • Dynamic structure inside <unit> • <segment>, <ignorable> -> <source>, <target> • A processor may merge or split the contents of segments or ignorable.
What's the Inline Markup? The inline markup is what's inside the <source> and <target> elements • Characters that are invalid in XML • Original inline codes • Annotations
Inline codes and segmentation • Inline codes belong to the <unit> and not to the <segment>(s) • ID uniqueness within the <unit> • Allows simple re-segmentation of the content of <unit> • No need to clone codes that span multiple segments
Characters that are Invalid in XML For example control characters are not allowed in XML content, so they cannot be stored as-it in XLIFF. <cp hex="0007"/> represents U+0007 (the "bell" character) - Same as Unicode LDML format - Only characters invalid in XML must use this notation.
Inline Codes • Support any type of native markup • Standalone: <ph/> • Spanning: <pc> and <sc/> + <ec/>
Inline Codes - Use Cases All possible cases: Standalone code <ph id='1'/> Well-formed spanning code <pc id='1'>text</pc> Start marker of spanning code <sc id='1'/> End marker of spanning code <ec rid='1'/> Orphan start marker of spanning code <sc id='1' isolated='yes'/> Orphan end marker of spanning code <ec id='1' isolated='yes'/>
Inline Codes - Storage of Original • No storage: <source>A<ph id="1"/>B</source> • Store, but only outside the segment: <source>A<ph id="1" nid="d1"/>B</source> <originalData> <data id="d1"><BR></data> </originalData>
Annotations <mrk> for well-formed constructs <sm/> + <em/> otherwise Attributes: • id (required) • type (default=generic) • translate (yes or no, default=yes) • ref (optional type-specific URI) • value (optional type-specific text/data)
Annotations Types • Translate annotations • Term annotations • Comment annotations • Custom annotations The IDs link the same annotation in source and target if needed.
Translate Annotation • To protect (or not) a span of content: <mrk id="1" translate="no">content</mrk> Note that translate can also be used with other types of annotations.
Term Annotation • To denote a "term": <mrk id="1" type="term" value="simple definition" ref="reference to more info">content</mrk> The id links source and target if needed
Comment Annotation • Simple: <source><mrk id="1" type="comment" value="The text of the comment">content</mrk></source> • With associated note: <source><mrk id="1" type="comment" ref="#n1">content</mrk></source> <notes> <note id="n1">Text of the note</note></notes>
Custom Annotation • User-defined annotation: - The type attribute = <prefix>:<userType> - The meanings of the value and ref attributes are defined by the user. <mrk id="1" type="myPrefix:isbn" value="978-0-14-44919-8">The Epic of Gilgamesh</mrk>
Extensions • A few attributes can take user-defined values: e.g. mrk@type, ph@type, pc@type • No additional attributes are allowed in any of the inline elements • No additional elements are allowed inside <source>, <target> or <data> Custom annotations are essentially the only way to extend markup inside the inline content.
Processing Requirements • Allowed markup transforms and related attribute mapping. Between <pc> and <sc>,<ec> pair. • Define requirements for creation and editing of target text. • Rules on cloning markup with and without reference to native data • Stricter rules on attributes and ID references • How to handle segmentation changes
XLIFF Toolkit - A Library and More • Java-based and open source (LGPL) • http://code.google.com/p/okapi-xliff-toolkit/ • Stream-based rather than DOM to handle very large documents • Reader is event-driven • Unit available as single object • Writer also available
Library - Reading a Document XLIFFReader reader = new XLIFFReader(); reader.open(new File("myInput.xlf")); while ( reader.hasNext() ) { XLIFFEvent event = reader.next(); if ( event.getType() == XLIFFEventType.TEXT_UNIT ) { Unit unit = event.getUnit(); // Do something with the unit } } reader.close();
Library - Updating a Document XLIFFReader reader = new XLIFFReader(); XLIFFwriter writer = new XLIFFWriter(); reader.open(new File("myInput.xlf"));writer.create(new File("myOutput.xlf")); while ( reader.hasNext() ) { XLIFFEvent event = reader.next(); if ( event.getType() == XLIFFEventType.TEXT_UNIT ) { Unit unit = event.getUnit(); // Do something with the unit } writer.write(event); } reader.close(); writer.close();
Q & A Useful links • Read the latest Editor's Draft:https://wiki.oasis-open.org/xliff/ • Comment or ask questions in the mailing lists:https://lists.oasis-open.org/archives/xliff-comment/https://lists.oasis-open.org/archives/xliff-users/ • Try out the toolkit:http://code.google.com/p/okapi-xliff-toolkit/