480 likes | 672 Views
What’s New in XLIFF 1.2? Tony Jewtushenko Director Research & Development Product Innovator Ltd. Co-Chair – OASIS XLIFF TC. The XML Localisation Interchange File Format. Agenda. Overview of XLIFF Definition, goals, benefits, architecture and basic XLIFF concepts
E N D
What’s New in XLIFF1.2? Tony JewtushenkoDirector Research & DevelopmentProduct Innovator Ltd. Co-Chair – OASIS XLIFF TC The XML Localisation Interchange File Format
Agenda • Overview of XLIFF Definition, goals, benefits, architecture and basic XLIFF concepts • What’s new in XLIFF 1.2New and changed features of XLIFF 1.2 normative specification • Non-Normative Representation GuidesA brief introduction of the representation guides provided with XLIFF 1.2
XLIFF Overview A glance at the definitions, goals and benefits of the XML Localisation Interchange File Format.
What is XLIFF? A specification for the lossless interchange of localizable data and its related information, which is tool-neutral, has been formalized as an XML vocabulary, and features an extensibility mechanism.
Why XLIFF was created… Localisation Is Difficult • Insufficient interoperability between tools • Lack of support for overall localisation workflow • Necessity of localisation tools developers to deal with many formats • Large number of proprietary intermediate formats
Contributors to XLIFF - Past and Present • Microsoft • Moravia IT • Novell • Oracle • Red Hat • PASS Engineering • SAP • SDL International • Sun Microsystems • Tektronix • TRADOS • XML Intl • Alchemy Software • Bowne Global Solutions • Convey Software • Ektron, Inc • ENLASOCorp(RWS) • Globalsight • Heartsome • HP • Idiom Technologies, Inc • Lionbridge • LRC • Lotus/IBM
OASIS XLIFF TC Members as of 1 Sept 06 • TC Officers: • Chairs: Tony Jewtushenko, Product Innovator Ltd; Bryan Schnabel, Tektronix • Secretary: Peter Reynolds, Idiom Technologies, Inc. • Current Members of TC: • Mat Lovatt, Oracle • Doug Domeny, Ektron • Rodolfo Raya, Heartsome • Eiju Akahane, IBM • Steven Harris, Idiom Technologies, Inc. • Fredrik Corneliusson, Lionbridge • Joachim Schurig, Lionbridge • Milan Karasek, Moravia IT • Florian Sachse, Pass Engineering • Christian Lieske, SAP • Magnus Martikainen, SDL International • David Pooley, SDL International • Kevin Bargary, University of Limerick Localisation Research Centre • Reinhard Schaler, University of Limerick Localisation Research Centre • Andrzej Zydron, XML- Intl
OASIS: Standards Body Home of XLIFF • OASIS: Organization for the Advancement of Structured Information Standards • World’s largest independent, non-profit organization dedicated to the standardisation of XML applications and Web Services • More than 150 member companies plus individuals • Operates XML.ORG Registry, the open community clearinghouse of XML application schemas clearinghouse of XML application schemas • Technical work on XML interoperability includes XML conformance and XML Registries/Repositories • General XML technical resource
XLIFF Benefits: Reduces Effort in Deploying Integrated Best of Breed Solutions Interoperability Reduces Defects introduced by Manual Processing and Handling Reduces Vendor Lock-In, Re-Use OpenStandards Automation Flexiblility Cost,Time Leverages services, technologies, vendors Reduce cost, turnaround time Scalability Easy to scale and future proof
High Level XLIFF Architecture An XLIFF document is a container for all data needed for a localisation project: • Localizable objects (e.g. text strings, graphics) in source and target languages. • Supplementary information (e.g. glossaries, or material to recreate the original format). • Administrative information (e.g. workflow data). • Custom data (e.g. initialization information for tools).
The XLIFF Document • An XLIFF document is designed to store the extracted data related to localisation. • Each given source container (e.g. a file, a database table, and so forth) corresponds to a <file> element in XLIFF. • Each XLIFF document can include several <file> elements. • An entire localisation project could stored in a single XLIFF document.
Bilingual Model • Each <file> element is designed to store one source language and one target language • The rationale is that the translation of different target language is done by different people most of the time • However, languages in <alt-trans> element can be different. For example, proposed matches in national Portuguese when translating into Brazilian Portuguese.
Localisable Objects • Besides localisable text, XLIFF can also contain other localisable object types such as binary graphics • Supplementary information can be represented in a generic way through inline codes (e.g. formatting of text) • Relationship between object can be captured (e.g. a hierarchical menu or text related to a web graphic)
Supplementary Info • XLIFF provides “hooks” for storing supplementary information in reference element • Glossaries • Translation memories • Segmentation Rules (via SRX file) • The supplementary information can be referenced (i.e. reside outside of the document), or embedded within the document
Administrative Info XLIFF provides mechanisms for capturing administrative information: • For relating source material to XLIFF documents. • For storing workflow data. • For providing pre-translation entries. • For keeping track of changes.
Administrative Info – Pre-Translation A set of proposed translations can be included for each <trans-unit> element, using the <alt-trans> element. <trans-unit id='1'> <source xml:lang='en'>The text</source> <alt-trans quality-match='high' origin='MTsystem'> <target xml:lang='fr'>Le texte</target> </alt-trans></trans-unit>
Customising XLIFF Customise XLIFF by extending (adding) user defined: • Elements • Attributes • Attribute Values
Extending Elements • Extension points in the following elements: • <alt-trans>, <bin-unit>,<group>, <header>,<tool>, <trans-unit>, and new in 1.2: <xliff> and <seg-source>. • content of each custom element can be any valid XML content: • empty content, PCDATA, mixed content, and so forth • Custom elements defined in private namespace schema
Example of Extending Elements <xliff version='1.2' xmlns='urn:oasis:names:tc:xliff:document:1.2' xmlns:sup='http://www.ChaucerState.ac.pg/Frm/XLFSup-v1'> <file original='passus-1.doc' source-language='enm‘ datatype='plaintext'> <group> <sup:SourceInfo> <sup:Book>Piers Plowman, Passus 1</sup:Book> <sup:Author>William Langland</sup:Author> </sup:SourceInfo> <sup:WorkInfo Task='transcription' Context='Middle-English:1360'/> <trans-unit id='1'> <source xml:lang='enm'>What this mountaigne bymeneth</source> <target xml:lang='en'>What this mountain means</target> <sup:Reference Type='strophe'>1-a</sup:Reference> </trans-unit> </group> </file> </xliff> Non-XLIFF elements in BOLD
Non-XLIFF elements Defined in XSD: <xsd:schema targetNamespace="XLFSup-v1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sup="http://www.ChaucerState.ac.pg/Frm/XLFSup-v1" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xsd:element name="SourceInfo"> <xsd:complexType> <xsd:sequence maxOccurs="unbounded"> <xsd:element name="Book" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="WorkInfo"> <xsd:complexType> <xsd:attribute name="Task" type="xsd:string"/> <xsd:attribute name="Context" type="xsd:string"/> </xsd:complexType> </xsd:element> <xsd:element name="Reference"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string">Struct_InLine <xsd:attribute name="Type" type="xsd:string"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element> </xsd:schema>
Extending Attributes • Attributes of a namespace different than XLIFF can be included in these XLIFF elements: • <alt-trans>, <bin-source>, <bintarget>,<bin-unit>, <bpt>, <bx/>, <ept>, <ex/>, <file>, <g>, <group>, <it>, <mrk>,<ph>, <source>, <target>, <tool>, <trans-unit>, <x/>,and new in 1.2:<xliff>, <seg-source>. • No specific location where to insert the non-XLIFF attributes • No limit to the number of non-XLIFF attributes that can be used in an XLIFF document
Extending Attributes Attributes from HTML extend <group> and <trans-unit> <xliff version='1.2' xmlns='urn:oasis:names:tc:xliff:document:1.2' xmlns:htm='http://www.w3.org/1999/xhtml'> <file original='table.htm' source-language='en' datatype='html'> <group restype='table' htm:border='1' htm:cellpadding='5‘ htm:cellspacing='0' htm:width='100%'> <group restype='row'> <trans-unit id='1' htm:valign='top' htm:width='30%'> <source>Text of row 1 column 1</source> </trans-unit> <trans-unit id='1' htm:valign='top' htm:width='30%'> <source>Text of row 1 column 2</source> </trans-unit> </group> <group restype='row'> <trans-unit id='1' htm:valign='top' htm:width='30%'> <source>Text of row 2 column 1</source> </trans-unit> <trans-unit id='1' htm:valign='top' htm:width='30%'> <source>Text of row 2 column 2</source> </trans-unit> </group> </group> </file> </xliff>
Extending Attribute Values • Attributes where the list of values can be extended are the following: context-type, count-type, ctype, datatype, mtype, priority, purpose, restype, size-unit, state, state-qualifier, unit; new in 1.2: alttranstype, reformat • User-defined values must start with a “x-” prefix • There is no specified mechanism to validate individual user-defined values, beyond starting with “x-”
Example of Extending Attribute Values The following excerpt shows how the user-defined value “x-for-engineer” can be utilized in a document: ... <group> <context-group name='EngineersData'> <context context-type='x-for-engineers'>Data...</context> </context-group> </group> ...
Embedding XLIFF • Can embed an entire or part of an XLIFF doc in other XML doc • Valid where XML defined by XML Schema (XSD) includes an <any> element in the definition of the element where the XLIFF data can be inserted
What’s new in XLIFF 1.2 New and changed features of XLIFF 1.2 normative specification
New, Deprecated or Changed 1.1 to 1.2 • Validation via Transitional and Strict models • Segmentation Support added • Add mid as an optional attribute for the <alt-trans> element • Changed name attribute for <context-group> from required to optional, and modified description • Added extension point at <xliff> • Tracking/Accepting Suggested Translations added: • Add a alttranstype attribute for the alt-trans element. • Deprecate the use of multiple target elements in a single alt-trans. • Deprecate the restype attribute for the target element. • Introduce the phase-name attribute for alt-trans element. • Introduce a convention: more recent alt-trans elements should appear before older ones.
Validation in 1.2 • Validation via two “Flavours” of XSD (Schema): • Transitional: Deprecated (obsolete) elements and attributes are permitted. Use to validate reading older version documents (XLIFF 1.1). xsi:schemaLocation='urn:oasis:names:tc:xliff:document:1.2 xliffcore-1.2-transitional.xsd‘ • Strict: Deprecated items are not permitted. Use to validate when creating XLIFF 1.2 documents. xsi:schemaLocation='urn:oasis:names:tc:xliff:document:1.2 xliffcore-1.2-strict.xsd'
XLIFF 1.2 Segmentation: seg-source How corresponding segments are referenced between <seg-source> and <target> <trans-unit id= "1"><source>First sentence.Second sentence.</source><seg-source><mrk mtype="seg" mid="1">First sentence.</mrk><mrk mtype="seg" mid="2">Second sentence.</mrk></seg-source><target><mrk mtype="seg" mid="1">Translated first sentence.</mrk><mrk mtype="seg" mid="2">Translated second sentence.</mrk></target></trans-unit>
XLIFF 1.2 Segmentation: seg-source Alt-trans may also be segmented: <trans-unit id="3"> <source>First sentence. Second sentence.</source> <alt-trans match-quality="100%"> <source>The second sentence.</source> <seg-source> <mrk mtype="seg" mid="1">First sentence.</mrk> <mrk mtype="seg" mid="2">Second sentence.</mrk> </seg-source> <target> <mrk mtype="seg" mid="1">Translated first sentence.</mrk> <mrk mtype="seg" mid="2">Translated second sentence.</mrk> </target> </alt-trans> </trans-unit>
XLIFF 1.2 Segmentation: merged-trans Aggregating translations across multiple trans-units: <group merged-trans="yes"> <trans-unit id="t1"> <source>The German acronym v.</source> <target equiv-trans="no">Niemiecki skrót v. OT oznacza górną pozycję silnika.</target> </trans-unit> <trans-unit id="t2"> <source>OT signifies the top dead center position for an engine.</source> <target equiv-trans="no"/> </trans-unit> </group>
XLIFF 1.2 Segmentation: equiv-trans To denote when translation is not direct equivalent to source: <trans-unit id="t1"> <source>Constrained text for limited</source> <target equiv-trans="no">Tekst angielski dla</target> </trans-unit> <trans-unit id="t2"> <source>display for English</source> <target equiv-trans="no">ograniczonego pola</target> </trans-unit>
XLIFF 1.2 Add a type attribute for the <alt-trans> element The type attribute is to be optional, and is to have the following values and meanings:
XLIFF 1.2 Additional revision to alt-trans • Introduce the phase-name attribute for <alt-trans> • makes it possible to find out who made the change, when, and which process the change was introduced in • Deprecate the restype attribute for the <target> element • no longer needed, as the <target> is always of the same restype as the <trans-unit> or <alt-trans> it appears in • Introduce the phase-name attribute for <alt-trans> • makes it possible to find out who made the change, when, and which process the change was introduced in • convention: more recent <alt-trans> elements should appear before older ones • determine the order of changes if multiple previous versions have been introduced
Non-Normative Representation Guides A brief walk-through of the Representation Guides provided with XLIFF 1.2
Purpose of the Guides • Synonymous with “profile” specifications • Non-normative • Not requirement for “legal” XLIFF 1.2 • Guidance for consistently representing native formats as XLIFF across implementations • Kickstart new implementations • Better interoperability between tools
Guide Contents • Recommended Extraction Techniques and Considerations • Recommended mappings from native structures to XLIFF • Strategies for implementing Translation Memory support (using inline tags) • Detailed examples and supplementary sample files
Extract-Localize-Merge Minimalist Approach • Process: • Identify localisable content (resources) and non-localisable content (code) • Populate XLIFF document’s trans-unit and bin-unit with localisable content • Create “Skeleton File” with localisable content stripped out and replaced with tokens that map to XLIFF trans-unit or bin-unit ID’s • Translate XLIFF document • Merge translated data in XLIFF with Skeleton to generate the localised translated material • Skeleton file is optional and not recommended in certain circumstances (e.g., HTML or if tool interoperability required) • In <SKL> embed the entire Skeleton file within the XLIFF file or specify the file’s location • XLIFF doesn’t define the Skeleton file or token format
Convert/Transform Paradigm (maximalist approach) • Process: • Convert original material by mapping entire original document to XLIFF (using representation guides) • Structural information (code) stored in XLIFF container as non-translatable trans-units / bin-units • Translate XLIFF content • Generate the native translated material directly from the XLIFF content • Best suited for textual resource formats (RCDATA, Java, PO/POT) and mark-up languages like (X)HTML and XML • Difficult and impractical for binary resource formats (e.g., EXE’s and DLL’s) Original Material Filter Translated Material XLIFF
Minimalist Example –Source Content & Skeleton A very simple HTML file: <html> <head> <h1 class='title'>Almost the Smallest HTML File</title> </head> <body> <p>Just some stuff here to fill up space</p> </body> </html> <html> <head> <title>%%%1%%%</title> </head> <body> <p>%%%2%%%</p> </body> </html> Original Content Filter … <header> <skl> <external-file href='sample.skl'/> </skl> </header> <body> <trans-unit id='%%%1%%%'> <source xml:lang='en'>Almost the Smallest HTML File</source> </trans-unit> <trans-unit id='%%%2%%% “restype='x-html-p'> <source xml:lang='en'>Just some stuff here to fill up space</source> </trans-unit> </body> … XLIFF Skeleton
Maximalist Example – Transform content to XLIFF Full Transformation: <html> <head> <h1 class='title'>Almost the Smallest HTML File</title> </head> <body> <p>Just some stuff here to fill up space</p> </body> </html> … <body> <group restype='x-html-html'> <group restype='x-html-head'> <trans-unit id='1' restype='x-html-p-title' html:class='title'> <source xml:lang='en'>Almost the Smallest HTML File</source> </trans-unit> </group> <group restype='x-html-body'> <trans-unit id='2'restype='x-html-p'> <source xml:lang='en'>Just some stuff here to fill up space</source> </trans-unit> </group> </group> </body> … Original Content XLIFF
Guides provided with XLIFF 1.2 • (X)HTML • Many flavours of HTML, guide focuses on HTML 4.01, XHTML 1.0 • Java Resource Bundles • Support for java.util.ResourceBundle abstract class’ two subclasses: PropertyResourceBundle and ListResourceBundle • Gettext PO/POT files • Linux resource format
To Get the Most from the Guides • Review the document in full before commencing design or development of an XLIFF solution • Considerations for recommended source document structure and content • Identify exceptions (e.g., dynamically generated HTML via server-side processing) • Consider the Guide’s recommended Extraction approach when designing overall architecture: • HTML recommends “maximalist”, but provides examples for “minimalist” as well. • Both PO/POT and Java make no specific recommendation, but examples are “maximalist” • Order of Extraction recommendations: typically in the order of the data in the source document • Refer to Mappings Reference in each guide when designing and building filters • Recommendations are comprehensive with many examples • Non-standard structures and conventions are dealt with (especially for (X)HTML) • Use the Sample files • Valuable reference for learning • Provides validation during development effort • Verify compliance by feeding sample files into filter – either native source or XLIFF
More Representation Guides • Late draft of Windows 32 / .NET • Not approved, but is posted on the XLIFF website • Requires more expert input • More to follow upon request
More Information • The XLIFF TC Web Site: http://www.xliff.org • Presenter: • XLIFF TC Co-Chair: Tony Jewtushenko (Product Innovator Ltd)(tony.jewtushenko@productinnovator.com)
Thank You... Questions?
Product Innovator Ltd provides product management and software process improvement training and mentoring services to technology companies seeking to maximize their productivity and revenue potential Contact: tony.jewtushenko@productinnovator.com www.productinnovator.com +353 1 8875183 / +353.87.2479057