810 likes | 995 Views
An Introduction to XLIFF. The XML Localisation Interchange File Format. Agenda. Overview of Open Standards Benefits, drawbacks and development process Survey of Localisation Standards TMS, TBX, OpenTag Overview of XLIFF 1.0 Definition, goals, and benefits of XLIFF
E N D
An Introductionto XLIFF The XML Localisation Interchange File Format
Agenda • Overview of Open Standards Benefits, drawbacks and development process Survey of Localisation Standards TMS, TBX, OpenTag • Overview of XLIFF 1.0Definition, goals, and benefits of XLIFF Business Use Cases Brief history of XLIFF • ArchitectureMain features of XLIFF 1.0 • The Real WorldExample of using XLIFF Tools support for XLIFF 1.0 • Current State of AffairsXLIFF 1.1 – What’s new Work at OASIS on XLIFF Slide 2
Industry Standards Overview A little bit about “Standards”… Slide 3
Definition of a “Standard” is... A definition or format that has been approved by a recognized standards organization or is accepted as a de facto standard by the industry. Standards exist for programming languages, operating systems, data formats, communications protocols, and electrical interfaces* * Definition by www.webopedia.com Slide 4
Standards are created by... • Default acceptance of private specifications by the market • Government regulation via state regulatory agency or public utility • Formal standardisation via consensus body or committee Slide 5
Categories of Standards • Units, reference, definition- temperature, weights, lengths, volumes, etc. • Similarity - screw gauges, character sets, colour schemes, UNIX operating system • Compatibility - API’s, UI’s, nuts and bolts, hand tools and implements, radio transmitter & radio, modem standards (v32, v34), XLIFF 1.0 • Etiquette - The IETF Internet draft Protocol Extension Protocol (PEP), designed to accommodate extensions of applications such as HTTP clients, servers and proxies, 3G Slide 6
Further Reading on Standards… • The Role of Standards in Today’s Society and in the Future, Dr. Carl Cargill, Director – Corporate Standards, Sun Microsystems. Inc:http://www.house.gov/science/cargill_091300.htm • The Business of Open-Source Software, Frank Hecker, Originally published May 1998, revised 20 June 2000:http://www.hecker.org/writings/setting-up-shop.html • Standards Making: Behind the Scenes, Don Deutsch:http://otn.oracle.com/oramag/webcolumns/2003/opinion/deutsch_opinion.html Slide 7
XLIFF 1.0 Overview A glance at the definitions, goals and benefits of the XML Localisation Interchange File Format. Slide 8
What is XLIFF? • A specification • for the lossless interchange of localizable data and its related information, • which is tool-neutral, • has been formalized as an XML vocabulary (document type definition), • and features an extensibility mechanism. Slide 9
XLIFF TC’s Charter “The purpose of the OASIS XLIFF TC is to define, through XML vocabularies, an extensible specification for the interchange of localization information. The specification will provide the ability to mark up and capture localizable data and interoperate with different processes or phases without loss of information. The vocabularies will be tool-neutral, support the localization-related aspects of internationalization and the entire localization process. The vocabularies will support common software and content data formats. The specification will provide an extensibility mechanism to allow the development of tools compatible with an implementer's own proprietary data formats and workflow requirements.” Slide 10
Why XLIFF is Needed? Localization offers the following challenges: • Insufficient interoperability between tools. • Lack of support for overall localization workflow. • Necessity of localization tools developers to deal with many formats. • Large number of proprietary intermediate formats. Slide 11
Advantages – Localization Customer • Single format for adjunct processing (e.g. quality control in terms of spell checking). • Less dependency on vendors which are able to work with special formats. • Tighter control on what goes to localization (Pre-filtering of what to translate or not). • Controlled information flow (author/developer notes, item properties, etc.). • ID-based leveraging. • All advantages of XML-based processing. Slide 12
Advantages – Tools Vendor • Focus on development of core functionality rather treatment of source format. • Allow usage of tools in new contexts. • All advantages of XML-based processing. Slide 13
Advantages – Service Provider • Single format for adjunct processing (e.g. quality control in terms of spell checking). • Less dependency on specific localization tools. • Controlled information flow (author/developer notes, item properties, etc.). • Allow usage of tools in new contexts. • All advantages of XML-based processing. • Open and standard solution for proprietary formats. Slide 14
Advantages – Technology (1/2) • For a given utility, only one implementation is necessary (e.g. not one spell checker for RTF, and another one for HTML). • Increases usability of utilities (i.e. all formats with XLIFF filters can be used with XLIFF-enabled utilities). Slide 15
Advantages – Technology (2/2) • All advantages of XML-based processing: • Use of its internationalization features. • Better interoperability and cross-platform support. • Powerful rendering options (XSL-FO, CSS). • Powerful transformation options (XSLT). • Greater integration with Web services. • Access to existing, and often open-source, XML implementation (lower costs). Slide 16
Basic Use Case – without XLIFF Native File 1 (e.g., HTML) Native File 2 (e.g., Java Files) Developer Applications Customer Specific Tool (s) Translator Native File 3 (e.g., Java Properties) Native File n Tool Resource Filters Localisation Domain Publisher/ Customer Domain Slide 17
Basic Use Case –with XLIFF Direct to XLIFF authoring XLIFF compliant Developer Applications XLIFF Compliant Editor Translator Pre-processing XLIFF file(s) containing HTML, Java, Properties, etc translatable resources HTML RC Data Java Properties Non XLIFF compliant Developer Applications - OR - Localisation Domain Publisher/ Customer Domain Slide 18
Simple Automated Localisation Use Case XLIFF Translation Kit Pseudo Translate / Test Defect Report Requires Translation Generate XLIFF Leverage 0% Translated Translate Translation Repository Developer Localization Engineer XLIFF Editor Translator Update XLIFF Translation Kit 100% Translated 100% Translated Slide 19
Automated Localisation with CAT Use Case XLIFF Translation Kit Pseudo Translate / Test Defect Report Requires Translation Generate XLIFF 100% match Fuzzy match Machine Translate 0% Translated Translate Translation Memory Translation Repository Machine Translation Developer Localization Engineer XLIFF Editor Translator Update XLIFF Translation Kit 100% Translated 100% Translated Slide 20
Genesis of XLIFF • Founded: Sept 2000 • Founding Members: Novell, Oracle and Sun • Initially named “DataDefinition” group Slide 21
XLIFF 1.0 Timeline • September 2000 - DataDefinition Kickoff • December 2000 - first face to face • March 2001 - second face to face • End March 2001 - draft 1.0 spec and DTD published • June 2001 - White Paper published • December 2001 - OASIS XLIFF Technical Committee Proposal submitted • April 2002 – XLIFF 1.0 Specification approved by formal vote as an OASIS Committee Specification Slide 22
OASIS: A New Home for XLIFF • OASIS: Organization for the Advancement of Structured Information Standards • World’s largest independent, non-profit organization dedicated to the standardisation of XML applications and Web Services • More than 150 member companies plus individuals • Operates XML.ORG Registry, the open community clearinghouse of XML application schemas clearinghouse of XML application schemas • Technical work on XML interoperability includes XML conformance and XML Registries/Repositories • General XML technical resource Slide 23
Drivers Behind XLIFF • Alchemy Software • Bowne Global Solutions • Convey Software • Ektron, Inc • Globalsight • HP • Lotus/IBM • Lionbridge • LRC • Moravia IT • Novell • Oracle • Microsoft • RWS Group • SAP • SDL International • Sun Microsystems • Tektronix Slide 24
Present OASIS XLIFF TC • TC Officers: • TC Chair: Tony Jewtushenko, Oracle Corporation • TC Vice-Chair: Jonathan Clark, Lionbidge • TC Secretary: Peter Reynolds, Bowne Global Solutions • TC Editor: Yves Savourel • Current Members of TC: • Gérard Cattin des Bois, Microsoft • Doug Domeny • Mirek Driml, Moravia-IT • Milan Karásek, Moravia-IT • Mark Levins, IBM/Lotus • Christian Lieske, SAP • Mat Lovatt, Oracle • Enda McDonnell • David Pooley, SDL • John Reid, Novell • Reinhard Schaler, LRC • Bryan Schnabel, Tektronix • Shigemichi Yazawa Slide 25
XLIFF TC in the Community • Shared interests with the OSCAR SIG at LISA • Segmentation and word-count. • Content markup (inline codes). • Shared interests with the W3C i18n WG • Localization directives. • Best practices. • In the localization aspects of the W3C. recommendations. • Web services. Slide 26
Architecture A look at XLIFF’s main features and how they work together. Slide 27
Extract-Localize-Merge Paradigm • Separate data related to localization from parts not related to localization. • Merge translated data with codes at the end of the process to create the final document. • Skeleton file is optional, so this paradigm is also optional Slide 28
A Birds-Eyes View An XLIFF document can capture anything needed for a localization project: • Localizable objects (e.g. text strings) in source and target languages. • Supplementary information (e.g. glossaries, or material to recreate the original format). • Administrative information (e.g. workflow data). • Custom data (e.g. initialization information for tools). Slide 29
The XLIFF Document • An XLIFF document is designed to store the extracted data related to localization. • Each given source container (e.g. a file, a database table, and so forth) corresponds to a <file> element in XLIFF. • Each XLIFF document can include several <file> elements. • A whole localization project can possibly be stored in a single XLIFF document. Slide 30
Bilingual Model • Each <file> element is designed to store one source language and one target language. • The rational is that the translation of different target language is done by different people most of the time. • However, languages in <alt-trans> element can be different. For example, proposed matches in national Portuguese when translating into Brazilian Portuguese. Slide 31
Localizable Objects – Overview • XLIFF allows not only text string as localizable object but also other object types such as graphics. • Supplementary information can be represented in a generic way through inline codes (e.g. formatting of text). • Relationship between object can be captured (e.g. all items in a menu). Slide 32
Localizable Objects – Text Extracted text goes in translation units (<trans-unit>), in a <source> element. The translation will go into a <target> element. <trans-unit id='1' datatype='winres' resname='IDCANCEL' restype='button' coord='8;80;50;14' style='0x20000'> <source xml:lang='en'>Cancel</source> <target xml:lang='fr'>Annuler</target></trans-unit> Slide 33
Localizable Objects – Inline Codes (1/3) Supplementary information for translation units (e.g. formatting, links, image references, etc.) can be encapsulated, using a set of elements (<bpt>, <ept>, <it>, and <ph>) very similar to the ones used in TMX. <source xml:lang='en'>Text in<bpt id='1'><b></bpt>bold<ept id='1'>&;lt;/b></ept>.</source> Slide 34
Localizable Objects – Inline Codes (2/3) Supplementary information can also be stored in the Skeleton; in this case, placeholders elements (<g>, <x/>, <bx/>, <ex/>), like the ones used in OpenTag, are inserted in the translation units. <source xml:lang='en'>Text in <g id='1‘ ctype=“bold”>bold</g>.</source> Slide 35
Localizable Objects – Inline Codes (3/3) XLIFF provides furthermore the general purpose element <mrk> to associate supplementary information to an arbitrary span of text. <source>The <mrk mtype='part-of-speech' ts='adjective'>fat</mrk> cat sleeps soundly.</source> Slide 36
Localizable Objects – Non-Textual Non-textual objects such as bitmap, cursor, etc. can be stored in the XLIFF document, internally or externally), using a <bin-unit> element. <bin-unit id='1' resname='IDB_OPEN' mime-type='image/bitmap' restype='bitmap'> <bin-source> <external-file href='Open.bmp'/> </bin-source></bin-unit> Slide 37
Localizable Objects – Relationships Relations between objects can be captured by the mean of the <group> element. (note: a <group> can also contain another <group>). <group restype='menu'> <trans-unit id='1' resname='ID_OPENFILE'> <source>&File...</source> </trans-unit> <trans-unit id='2' resname='ID_EXITAPP'> <source>E&xit</source> </trans-unit></group> Slide 38
Supplementary Info – Overview • XLIFF provides “hooks” for storing supplementary information (for example to glossaries or translation memories which should be used). • The supplementary information can be referenced (i.e. reside outside of the document), or embedded within the document. Slide 39
Supplementary Info – References (1/2) Pointers to reference material such as TMs or glossaries can be listed in the <header> of each <file> element. ...<header><reference> <external-file href="TranslationStyleGuidelines.doc" /></reference>... Slide 40
Supplementary Info – References (2/2) Alternatively, the reference material can also be stored directly in the XLIFF document. ...<header><glossary> <internal-file form="text"><![CDATA["English term 1","German term 1""English term 2","German term 2"...]]></internal-file></glossary>... Slide 41
Supplementary Info – Skeleton (1/2) Non-localizable parts can be references in Skeleton files, which can be referenced from within the XLIFF document. ...<header> <skl> <external-file href="JavaApp.properties.skl" uid="3d4031aa1ab"/> </skl></header>... Slide 42
Supplementary Info – Skeleton (2/2) The Skeleton content can also be embedded in the XLIFF document itself. ...<header> <skl> <internal-file crc="d341e458" form="base64">PE9LRlNLTDEwMDpSRVM6OTY0MDA4MjYxPg0KI2luY2x1ZGUgInJlc291cmNlLmgiDQpJRERfRElBTE9HMSBESUFMTX01PREFMRlJBTUUgfCBXU19QpDQVBUSU9O... </internal-file> </skl></header>... Slide 43
Administrative Info – Overview XLIFF provides mechanisms for capturing administrative information: • For relating source material to XLIFF documents. • For storing workflow data. • For providing pre-translation entries. • For keeping track of changes. Slide 44
Administrative Info – Source First, define what is in the document, and how it relates to the source. <?xml version="1.0" encoding="utf-8"?><xliff version="1.0"> <file original="JavaApp.properties" tool="OkapiFilter:JavaProperties" source-language="en" datatype="java" date="2002-07-25T17:13:14Z" target-language="ja"> <header>... Slide 45
Administrative Info – Workflow (1/2) Simple data about the steps of the process for each <file> element can be stored in its <header> element. ...<header> <phase-group> <phase phase-name="Step-001" process-name="Extraction" tool="myTool" contact-email="amity@myCorp.com"/> </phase> </phase-group>... Slide 46
Administrative Info – Workflow (2/2) Reference to the different phases can be set in the different items of the <file> element (for example: where this edit came from? etc.) <trans-unit id='1'> <source xml:lang='en'>The text</source> <target xml:lang='fr' phase-name='Edit' >Le texte</target> <alt-trans> <target xml:lang='fr' phase-name='Trans' >Un texte</target> </alt-trans></trans-unit> Slide 47
Administrative Info – Pre-Leveraging A set of proposed translation can be included for each <trans-unit> element, using the <alt-trans> element. <trans-unit id='1'> <source xml:lang='en'>The text</source> <alt-trans quality-match='high' origin='MTsystem'> <target xml:lang='fr'>Le texte</target> </alt-trans></trans-unit> Slide 48
Administrative Info – Tracking Changes Modifications made during the course of the process (translation, edit, proof, review, etc.), can also be stored using <alt-trans>. <trans-unit id='1'> <source xml:lang='en'>The text</source> <target xml:lang='fr' phase-name='Edit' >Le texte</target> <alt-trans> <target xml:lang='fr' phase-name='Trans' >Un texte</target> </alt-trans></trans-unit> Slide 49
Custom Data Use the <prop> element and the ts attribute to store user-defined information. <trans-unit id='1' ts='ctx:23a7'> <prop-group> <prop prop-type='myType' >Some property data</prop> </prop-group> <source>Text</source></trans-unit> Slide 50