1 / 59

L10N Standards Warszawa 201 4

http://maturebabespics.com/. http://maturebabespics.com/. L10N Standards Warszawa 201 4. Why Standards?. Why have Standards?. L10N Standards. What are we going to cover: Why L10N standards are important The role XML has to play Key L10N standards data standards

Download Presentation

L10N Standards Warszawa 201 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. http://maturebabespics.com/ http://maturebabespics.com/ L10N Standards Warszawa 2014

  2. WhyStandards?

  3. WhyhaveStandards?

  4. L10N Standards What are we going to cover: Why L10N standards are important The role XML has to play Key L10N standards data standards How to leverage L10N standards Creating a totally data driven automated L10N process Interoperability

  5. Why have Standards?

  6. Current State of Art

  7. L10N Typical Workflow

  8. What you need is a better crane!???

  9. Localization without Standards Customer source text source text extracted text extract tm process target text prepared text merge target text target text translate translated text QA

  10. True Cost of Translation

  11. Standards = Uniform Data

  12. ISO Standard

  13. Standards = Efficiency

  14. Standards= Lower Costs

  15. Standards = Safe to Implement

  16. Standards = Greater Interoperability

  17. Standards: Unforeseen Benefits

  18. Standards: Unforeseen Benefits

  19. Standards: Misuse imap://azydron%40xml-intl%40xml-intl%2Ecom@xml-intl.com:143/fetch%3EUID%3E.INBOX%3E87222?part=1.2&filename=image003.jpg

  20. Standards: Abuse

  21. Standards: Sabotage • Sabotaged Standards: • Proprietary extensions • Bad implementations

  22. The importance of XML • Everything is now XML • HTML/XHTML • Web Services • Adobe FrameMaker • Microsoft Office • Open Office • ASP • XAML • Java Properties • DITA • Standards: TMX, XLIFF, SRX, GMX, TBX, xml:tm • OAXAL Open Architecture for XML Authoring and Localization

  23. The power of XML • Any electronic format not in XML can be converted to XML • Frame Maker • RTF • Microsoft Office pre 2007 • Quark Express • Windows resource files • Java resources • PO/POT • YAML • Etc. • And then back into the original format

  24. Benefits of XML for L10N • Separation of form and content • Should make documents easier to translate • There are some critical design decisions • Mistakes can hinder translatability • XML can bootstrap its own localization

  25. The significance of XML • XML is not just another electronic format • XML is an eXtensible syntax • XML is a formal IT grammar • XML is programmable • XML is can bootstrap its own localization

  26. Benefits of XML for L10N • Why use XML for Localization? • Most localizable documents are now in XML • One input format • Elegant • Uses the latest IT technology • Separation of source and content • One single data bus • Open Standards based • You can use XML assist its own localization • One extraction + TM + SMT engine

  27. Core L10 Standards • W3C ITS Document Rules • ETSI LIS SRX • ETSI LIS xml:tm • ETSI LIS TMX • ETSI LIS TBX • ETSI LIS GMX • OASIS XLIFF • W3C/OASIS DITA (XHTML, DocBook, or any XML Vocabulary) • Linport Interoperability: TIPP XLIFF:doc

  28. ITS • Internationalization and Localization Tag Set • http://www.w3.org/International/its • Internationalization Tag Set • Document Rules for a given XML vocabulary: • Inline elements (within text)‏ • Sub flows • Non-translatable • Translatable attributes • Guidelines for localizing XML documents • Internationalization and Localization Markup Requirements • Version 1.0, 2008 • Version 2.0, 2013

  29. TMX • http://www.etsi.org/deliver/etsi_gs/lis/001_099/002/01.04.02_60/gs_lis002v010402p.pdf • Translation Memory Exchange • Current version 1.4b, 2.0 undergoing review • Allows for the interchange of translation memories between different vendor systems • No translation vendor lock-in • Free exchange of translation assets

  30. TMXHistory • First LISA OSCAR Standard • Version 1.1 1998 • Version 1.2 1999 • Version 1.3 2001 • Version 1.4b 2002 • Moved to ETSI/LIS 2012 • Version 2.0 2014? • Two level of implementation: • Level 1 (Plain Text Only) • Level 2 (Content Markup)‏

  31. SRX http://www.gala-global.org/oscarStandards/srx/srx20.html • Segmentation Rules Exchange • Current version 2.0 2008 • How sentences are segmented • Allows for the exchange of segmentation rules using regular expressions • Complements TMX standard • Quoted XLIFF, TMX and xml:tm

  32. SRXKey Concepts • Unicode Regular expression syntax defined • Meta characters – Unicode regular expressions: "\X", "\s", "\S" etc.  • Operators – "*", "|", "?", "+" etc. • Defines: • Language rules: segmentation rules • Map rules: how to apply the segmentation rules

  33. GMX http://docbox.etsi.org/ISG/Open/ISGLIS/GMX-V/GMX-V/GMX-V-2.0.html • Global Information Management Metrics eXchange • GMX/V Approved LISA OSCAR Standard February 2007 • Tripartite • GMX-V : Volume, published for public comment • GMX-C : Complexity, initial specification • GMX-Q : Quality • Standard for defining a L10N job • Allows for quantifying job complexity • GMX/V 2.0 Approved ETSI LIS • added support for CJK word counts • overall character count including white space characters

  34. GMX-V • GIM Metrics eXchange – Volume • Objectives: • Unambiguous and verifiable definition of word and character counts • A method of exchanging counts within an XML framework • Two types of count: • Verifiable, based on electronic documents • Non-verifiable • Canonical form: XLIFF based • Word boundaries: Unicode TR29 • Unicode character encoding • Minimum conformance • Total Character Count • Total Word Count

  35. XLIFF http://www.oasis-open.org/committees/xliff • XLIFF – XML Localization Interchange File Format • Current status • XLIFF 1.1 Committee Specification (31 Oct 2003)‏ • XLIFF 1.2 Approved as an OASIS Standard 2008 • Segmentation support • (X)HTML XLIFF 1.1 Representation Guide PO / POT XLIFF 1.1. Representation Guide • Java / Windows / .Net Representation Guide • XLIFF 2.0 currently out for public comment (not backwards compatible)

  36. XLIFF

  37. XLIFF • Single format for exchanging L10N from disperate sources • Loss-less • Tool-neutral • Formalized as an XML vocabulary • Can embed skeleton file

  38. xml:tm http://www.xtm-intl.com/manuals/xml-tm/xml-tm2.0.html • XML based Text Memory • Radical rethink of how to handle Translation Memory • Donated by XML INTL to LISA OSCAR • OSCAR Standard Feb 2007 • Adopted by ETSI LIS, version 2.0 ready for adoption • Takes the DITA reuse principle down to sentence level • Author Memory • Translation Memory

  39. xml:tm - Namespace • Namespace is a major feature of XML • Allows the mapping of different ontological entities onto the same representation • Allows different ways to look at the same data • Namespaces can be made transparent

  40. xml:tm • XML based text memory • Revolutionary approach to translating XML documents • First significant advance in translation memory technology • Uses XML namespace to transparently embed contextual information • The one ring that binds them all

  41. xml:tm namespace Example of the use of tm namespace in an XML document: <documentxmlns:tm="urn:xml-Intl-tm"> <tm:tm> <section> <para> <tm:te> <tm:tu> Namespace is very flexible. </tm:tu> <tm:tu> It is very easy to use. </tm:tu> </tm:te> </para>

  42. xml:tmnamespace Source document view Source document tm namespace view doc tm title te tu text text section section para te tu sentence tu sentence text para te tu sentence tu sentence text te tu sentence tu sentence para text para te tu sentence tu sentence text para text te tu sentence tu sentence para text te tu sentence tu sentence

  43. xml:tmTextMemory • Author memory Maintain memory of source text Authoring statistics Authoring tool input • Translation memory Automatic alignment Maintain perfect link of source and target text Reduce translation costs

  44. xml:tmDOMdifferencing DOM Differencing Updated Source Document Original Source Document tu id=”1” tu id=”1” tu id=”2” tu id=”2” deleted tu id=”3” tu id=”3” tu id=”4” tu id=”4” modified tu id=”5” tu id=”7” tu id=”6” tu id=”6” tu id=”8” new

  45. xml:tmtranslateddocumentinPolish Translated document tm namespace view Translated document view doc tm title te tu tekst tekst section section para te tu zdanie tu zdanie tekst para te tu zdanie tu zdanie tekst te tu zdanie tu zdanie para tekst para te tu zdanie tu zdanie tekst para tekst te tu zdanie tu zdanie para tekst te tu zdanie tu zdanie

  46. Putting It All Together

  47. Open Architecture for XML Authoring and Localization (OAXAL) • http://wiki.oasis-open.org/oaxal/FrontPage

  48. OAXAL 2.0

  49. OAXAL 2.0

More Related