190 likes | 201 Views
Learn about the importance of ITS in XML documents for localization. See how ITS metadata simplifies translation instructions and improves workflow efficiency. Explore the development of a browser plugin for ITS metadata preview.
E N D
Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program
WHAT IS ITS and why it’s so important • The Internationalization Tag Set (ITS) is a set of attributes and elements designed to provide internationalization and localization support in XML and HTML documents. It also defines implementations of these concepts • XML developers can use this namespace to integrate internationalization features directly into their own XML schemas and documents • The set is currently almost ready/frozen • We believe that this is a one of the key standards for localization industry • The set includes a number of categories of crucial importance to translators: • Terminology note andLocalization Note metadata • Translate (yes/no) metadata to mark non-translatable text • ITS metadata make it possible to include various instructions for translators into documents, add terminology and comments, and mark non-translatable segments • Will reduce inconsistency in adding translation instructions to documents • Provides a universal interface for transferring translation metadata between tools
WHY ARE WE DOING THIS: DETAILS • To make it possible to comment translatable content irrespective of its nature • To make these instructions easily accessible to translators and editors • Including recommendations, instructions, terminology suggestions • Independent from translation tools • Saving time: The text is already marked with context information • One doesn’t have to think whether smth. NEEDS TO BE TRANSLATED or not • One doesn’t have to think whether smth. IS A TERM or not • Key advantages/improvements: • Time (i.e. cost) • Quality (fewer translation errors) • Also very important for machine translation applications (post-editing in context)
WHY ARE WE DOING THIS: WORKFLOW PARADIGM CHANGE • FROM: • Bulk manual translation of “raw” content or post-editing “raw” machine-translation output • When external terminology glossaries, localization instructions and reference data are matched with content in indirect manner mostly in translator’s brain on-the-fly and to the extent of his/her understanding of these instructions and personal skills • TO: • Using natural language processing (NLP) tools and ITS metadata markup to pre-populate content to be translated or post-edited with context-related information • When external terminology glossaries, localization instructions and reference data are matched with content directly through automated process of preliminary linguistic analysis • Pre-processing is controlled by dedicated qualified linguists/terminologists/editors • PROVIDED THAT: • Glossaries, instructions and reference data are converted into format compatible with NLP tools and ITS markup • And corresponding content searching algorithms are created (including fuzzy algorithms)
What is being developed • ITS 2.0 implementation project, a part of the Multilingual Web-LT program funded by EU • Developing the ITS Browser Plugin as a building block of future “Work In Context System” (WICS) • Making it possible to view standard ITS (Internationalization Tag Set) translation-related metadata contained in XML, XLIFF, or HTML files • Can be done in parallel with translating using CAT tools or for reviewing materials • The JavaScript plugin would support most popular browsers • For previewing XML or XLIFF, standalone filters for conversion into HTML will be used • Implementation: • Standard-based preview solution: HTML5, Java Script, Web browser • A script located in the same folderas HTML files • The script is started by the browser automatically • It is expected thatboth scripts and filters will be publicly available
The Project Idea • ITS metadata-enriched XML or XLIFF files: what’s inside? • Previewing ITS metadata in Web browser while translating content in any CAT tool • Standard-based preview solution: HTML5, Java Script, Web browser • Next step: ITS metadata as a carrier for localization instructions and any reference data
The Work Breakdown: Project Components • Visual designs • Java scripts to render and navigate metadata and content • Rich sample files • Content format conversion algorithms: • XML+ITS -> HTML5+ITS* • XLIFF+ITS -> HTML5+ITS* • XML+ITS -> XLIFF+ITS (just an example) • HTML+ITS -> HTML5+ITS* * For the purposes of visualization, some redundant ITS syntax options for HTML are not supported.
THE PROJECT CORE: VISUAL DESIGNS • Screen space limitations in localization process:
THE PROJECT CORE: VISUAL DESIGNS (CONT.) • Collapsed view of metadata
THE PROJECT CORE: VISUAL DESIGNS (CONT.) • Expanded view of metadata
THE PROJECT CORE: VISUAL DESIGNS (CONT.) • Summary view of metadata
THE PROJECT CORE: VISUAL DESIGNS (CONT.) • Color highlighting to indicate metadata linked to content
THE PROJECT CORE: VISUAL DESIGNS (CONT.) • Visual “tags” to indicate metadata linked to content
THE PROJECT CORE: VISUAL DESIGNS (CONT.) • Visual tags to highlight metadata (example)
DEVELOPMENT STATUS • Sample files: to be completed by end of May • File conversion algorithms: to be completed by Sep 30: • XML+ITS -> XLIFF+ITS (July) (sample) • XML+ITS -> HTML5+ITS (August) • HTML+ITS -> HTML5+ITS (August) • XLIFF+ITS -> HTML5+ITS (September) • Visualization scripts: to be completed by end of June
KNOWN ISSUES: FORMAT CONVERSIONS • “Translation” of XPath expressions from source XML to target HTML • XLIFF: MRK element to be used instead of SPAN • Selection between SPAN and DIV elements in output HTML • Merging external ITS rule files into internal list of rules
KNOWN ISSUES: METADATA VISUALIZATION • Parsing local standoff markup along with other rules • Parsing list of merged ITS rules • Hyperlinks embedded in metadata • Static definitions like “Do not translate” for Translate category • Highlighting active ITS item • Displaying summary of all ITS items • Parsing nested ITS metadata • Differences in Java Script implementation between browsers • Navigation through content and ITS items • Fragmentation of content to avoid large pieces of text to be displayed
Live Demo The demo samples are built on the preliminary versions of visual designs and illustrate just a few ITS data categories: • Localization Note • Terminology • Translate
THANK YOU! Questions?