590 likes | 713 Views
http://maturebabespics.com/. http://maturebabespics.com/. L10N Standards Warszawa 201 4. Why Standards?. Why have Standards?. L10N Standards. What are we going to cover: Why L10N standards are important The role XML has to play Key L10N standards data standards
E N D
http://maturebabespics.com/ http://maturebabespics.com/ L10N Standards Warszawa 2014
L10N Standards What are we going to cover: Why L10N standards are important The role XML has to play Key L10N standards data standards How to leverage L10N standards Creating a totally data driven automated L10N process Interoperability
Localization without Standards Customer source text source text extracted text extract tm process target text prepared text merge target text target text translate translated text QA
Standards: Misuse imap://azydron%40xml-intl%40xml-intl%2Ecom@xml-intl.com:143/fetch%3EUID%3E.INBOX%3E87222?part=1.2&filename=image003.jpg
Standards: Sabotage • Sabotaged Standards: • Proprietary extensions • Bad implementations
The importance of XML • Everything is now XML • HTML/XHTML • Web Services • Adobe FrameMaker • Microsoft Office • Open Office • ASP • XAML • Java Properties • DITA • Standards: TMX, XLIFF, SRX, GMX, TBX, xml:tm • OAXAL Open Architecture for XML Authoring and Localization
The power of XML • Any electronic format not in XML can be converted to XML • Frame Maker • RTF • Microsoft Office pre 2007 • Quark Express • Windows resource files • Java resources • PO/POT • YAML • Etc. • And then back into the original format
Benefits of XML for L10N • Separation of form and content • Should make documents easier to translate • There are some critical design decisions • Mistakes can hinder translatability • XML can bootstrap its own localization
The significance of XML • XML is not just another electronic format • XML is an eXtensible syntax • XML is a formal IT grammar • XML is programmable • XML is can bootstrap its own localization
Benefits of XML for L10N • Why use XML for Localization? • Most localizable documents are now in XML • One input format • Elegant • Uses the latest IT technology • Separation of source and content • One single data bus • Open Standards based • You can use XML assist its own localization • One extraction + TM + SMT engine
Core L10 Standards • W3C ITS Document Rules • ETSI LIS SRX • ETSI LIS xml:tm • ETSI LIS TMX • ETSI LIS TBX • ETSI LIS GMX • OASIS XLIFF • W3C/OASIS DITA (XHTML, DocBook, or any XML Vocabulary) • Linport Interoperability: TIPP XLIFF:doc
ITS • Internationalization and Localization Tag Set • http://www.w3.org/International/its • Internationalization Tag Set • Document Rules for a given XML vocabulary: • Inline elements (within text) • Sub flows • Non-translatable • Translatable attributes • Guidelines for localizing XML documents • Internationalization and Localization Markup Requirements • Version 1.0, 2008 • Version 2.0, 2013
TMX • http://www.etsi.org/deliver/etsi_gs/lis/001_099/002/01.04.02_60/gs_lis002v010402p.pdf • Translation Memory Exchange • Current version 1.4b, 2.0 undergoing review • Allows for the interchange of translation memories between different vendor systems • No translation vendor lock-in • Free exchange of translation assets
TMXHistory • First LISA OSCAR Standard • Version 1.1 1998 • Version 1.2 1999 • Version 1.3 2001 • Version 1.4b 2002 • Moved to ETSI/LIS 2012 • Version 2.0 2014? • Two level of implementation: • Level 1 (Plain Text Only) • Level 2 (Content Markup)
SRX http://www.gala-global.org/oscarStandards/srx/srx20.html • Segmentation Rules Exchange • Current version 2.0 2008 • How sentences are segmented • Allows for the exchange of segmentation rules using regular expressions • Complements TMX standard • Quoted XLIFF, TMX and xml:tm
SRXKey Concepts • Unicode Regular expression syntax defined • Meta characters – Unicode regular expressions: "\X", "\s", "\S" etc. • Operators – "*", "|", "?", "+" etc. • Defines: • Language rules: segmentation rules • Map rules: how to apply the segmentation rules
GMX http://docbox.etsi.org/ISG/Open/ISGLIS/GMX-V/GMX-V/GMX-V-2.0.html • Global Information Management Metrics eXchange • GMX/V Approved LISA OSCAR Standard February 2007 • Tripartite • GMX-V : Volume, published for public comment • GMX-C : Complexity, initial specification • GMX-Q : Quality • Standard for defining a L10N job • Allows for quantifying job complexity • GMX/V 2.0 Approved ETSI LIS • added support for CJK word counts • overall character count including white space characters
GMX-V • GIM Metrics eXchange – Volume • Objectives: • Unambiguous and verifiable definition of word and character counts • A method of exchanging counts within an XML framework • Two types of count: • Verifiable, based on electronic documents • Non-verifiable • Canonical form: XLIFF based • Word boundaries: Unicode TR29 • Unicode character encoding • Minimum conformance • Total Character Count • Total Word Count
XLIFF http://www.oasis-open.org/committees/xliff • XLIFF – XML Localization Interchange File Format • Current status • XLIFF 1.1 Committee Specification (31 Oct 2003) • XLIFF 1.2 Approved as an OASIS Standard 2008 • Segmentation support • (X)HTML XLIFF 1.1 Representation Guide PO / POT XLIFF 1.1. Representation Guide • Java / Windows / .Net Representation Guide • XLIFF 2.0 currently out for public comment (not backwards compatible)
XLIFF • Single format for exchanging L10N from disperate sources • Loss-less • Tool-neutral • Formalized as an XML vocabulary • Can embed skeleton file
xml:tm http://www.xtm-intl.com/manuals/xml-tm/xml-tm2.0.html • XML based Text Memory • Radical rethink of how to handle Translation Memory • Donated by XML INTL to LISA OSCAR • OSCAR Standard Feb 2007 • Adopted by ETSI LIS, version 2.0 ready for adoption • Takes the DITA reuse principle down to sentence level • Author Memory • Translation Memory
xml:tm - Namespace • Namespace is a major feature of XML • Allows the mapping of different ontological entities onto the same representation • Allows different ways to look at the same data • Namespaces can be made transparent
xml:tm • XML based text memory • Revolutionary approach to translating XML documents • First significant advance in translation memory technology • Uses XML namespace to transparently embed contextual information • The one ring that binds them all
xml:tm namespace Example of the use of tm namespace in an XML document: <documentxmlns:tm="urn:xml-Intl-tm"> <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>
xml:tmnamespace Source document view Source document tm namespace view doc tm title te tu text text section section para te tu sentence tu sentence text para te tu sentence tu sentence text te tu sentence tu sentence para text para te tu sentence tu sentence text para text te tu sentence tu sentence para text te tu sentence tu sentence
xml:tmTextMemory • Author memory Maintain memory of source text Authoring statistics Authoring tool input • Translation memory Automatic alignment Maintain perfect link of source and target text Reduce translation costs
xml:tmDOMdifferencing DOM Differencing Updated Source Document Original Source Document tu id=”1” tu id=”1” tu id=”2” tu id=”2” deleted tu id=”3” tu id=”3” tu id=”4” tu id=”4” modified tu id=”5” tu id=”7” tu id=”6” tu id=”6” tu id=”8” new
xml:tmtranslateddocumentinPolish Translated document tm namespace view Translated document view doc tm title te tu tekst tekst section section para te tu zdanie tu zdanie tekst para te tu zdanie tu zdanie tekst te tu zdanie tu zdanie para tekst para te tu zdanie tu zdanie tekst para tekst te tu zdanie tu zdanie para tekst te tu zdanie tu zdanie
Open Architecture for XML Authoring and Localization (OAXAL) • http://wiki.oasis-open.org/oaxal/FrontPage