1 / 50

Localization and HTML5: Technical Aspects

Localization and HTML5: Technical Aspects. Felix Sasaki DFKI / W3C Fellow. Pitch: Why this presentation?. HTML5 is the upcoming (or existing) format for content on the Web The Web is becoming multilingual HTML5 localization is essential to make this happen

lamis
Download Presentation

Localization and HTML5: Technical Aspects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Localization and HTML5: Technical Aspects Felix Sasaki DFKI / W3C Fellow

  2. Pitch: Why this presentation? • HTML5 is the upcoming (or existing) format for content on the Web • The Web is becoming multilingual • HTML5 localization is essential to make this happen • Localization workflows with HTML5 input / output need to take various aspects of HTML5 into account – learn more here 

  3. Acknowledgement • Thanks to JirkaKosek for introducing the participants of the W3C MultilingualWeb-LT working group to the “do” and “do not” of HTML5 content creation and processing

  4. Overview • HTML5 Serializations + Model • Localization Workflow with HTML5 • Metadata for (HTML5) Localization • What Else?

  5. HTML5 – Serializations + Model • Two serializations <!DOCTYPE html> <html> <head> <metacharset=utf-8> <title>Myexample</title> </head> <body>... </body> </html> <htmlxmlns= "http://www.w3.org/1999/xhtml"> <head> <metacharset="utf-8"/> <title>Myexample</title> </head> <body>... </body> </html>

  6. HTML5 – Serializations + Model • Two serializations: HTML5 vs. XHTML5 <!DOCTYPE html> <html> <head> <metacharset=utf-8> <title>Myexample</title> </head> <body>... </body> </html> <htmlxmlns= "http://www.w3.org/1999/xhtml"> <head> <metacharset="utf-8"/> <title>Myexample</title> </head> <body>... </body> </html>

  7. HTML5 – Serializations + Model • Two serializations: HTML5 vs. XHTML5 <!DOCTYPE html> <html> <head> <metacharset=utf-8> <title>Myexample</title> </head> <body>... </body> </html> <htmlxmlns= "http://www.w3.org/1999/xhtml"> <head> <metacharset="utf-8"/> <title>Myexample</title> </head> <body>... </body> </html> One Document Object Model (DOM) document.getElementsByTagName("meta")

  8. Rational • More than 90% of the Web is invalid • See browser “Opera” MAMA report • XHTML was revolution • HTML5 is evolution • Parsing algorithm for existing Web content • Two serializations as input • Detailed error handling • Ouput: one DOM

  9. Overview • HTML5 Serializations + Model • Localization Workflow with HTML5 • Metadata for (HTML5) Localization • What Else?

  10. Localization Workflow with HTML5 HTML5 as XML HTML5 XLIFF-based Localization HTML5 as HTML XHTML5 HTML5 as HTML witherrors

  11. Localization Workflow with HTML5 HTML5 as XML HTML5 XLIFF-based Localization HTML5 as HTML XHTML5 HTML5 as HTML witherrors HTML5 parsing > DOM creation > (XML serialization) > XLIFF generation

  12. Localization Workflow with HTML5 HTML5 as XML HTML5 XLIFF-based Localization HTML5 as HTML XHTML5 HTML5 as HTML witherrors Transformation > XHTML5 > HTML5 parsing > HTML5 or XHTML5 HTML5 parsing > DOM creation > (XML serialization) > XLIFF generation

  13. Localization Workflow with HTML5 Central: HTML5 parsinglibrary, e.g. validator.nu HTML5 as XML HTML5 XLIFF-based Localization HTML5 as HTML XHTML5 HTML5 as HTML witherrors Transformation > XHTML5 > HTML5 parsing> HTML5 or XHTML5 HTML5 parsing> DOM creation > (XML serialization) > XLIFF generation

  14. Overview • HTML5 Serializations + Model • Localization Workflow with HTML5 • Metadata for (HTML5) Localization • What Else?

  15. Metadata for (HTML5) Localization:ITS 2.0 • “Internationalization Tag Set” 2.0 • Set of disjoint metadata items (“data categories”) for XML and HTML5 • Translate, Localization Note, Terminology, Directionality, Ruby, Language Information, Elements Within Text, Domain, Locale Filter, Provenance, Text Analysis Annotation, External Resource, Target Pointer, Id Value, Preserve Space, Localization Quality Issue, Localization Quality Précis, MT Confidence, Allowed Characters, Storage Size

  16. Metadata for (HTML5) Localization:ITS 2.0 • “Internationalization Tag Set” 2.0 • Some items are part of HTML5 spec • Translate, Localization Note, Terminology, Directionality, Ruby, Language Information, Elements Within Text, Domain, Locale Filter, Provenance, Text Analysis Annotation, External Resource, Target Pointer, Id Value, Preserve Space, Localization Quality Issue, Localization Quality Précis, MT Confidence, Allowed Characters, Storage Size

  17. “Translate” <!DOCTYPE html> <html> <head> <meta charset=utf-8> <title>Translate flag test: Default</title> </head> <body> <p>The <span translate=no>World Wide Web Consortium</span> is making the World Web Web worldwide!</p> </body> </html>

  18. ITS “global rules” • XPath based metadata approach • Attach metadata to several nodes • Specify metadata for a document format or (HTML) template • Example: map proprietary HTML to ITS “translate” <its:rules ...> <its:translateRule translate="no" selector="//h:*[@class='notranslate']"/> </its:rules>

  19. ITS “inline”, e.g. global rules in HTML5 • “Work” inside HTML “script” element with proper mime type • Upcoming: application/its+xml • If possible: avoid; use linked rules <!DOCTYPE html> <html> ... <script type="application/xml“ <its:rules ...> <its:translateRuletranslate="no" selector="//h:code"/> </its:rules> </script> ... </html>

  20. “Terminology” <!DOCTYPE html> <html lang=en> <head> <meta charset=utf-8> <title>Terminology test: default</title> </head> <body> <p>We need a new <span its-term=yes>motherboard</span> </p> </body> </html>

  21. “Directionality” <!DOCTYPE html> <html lang=en> <head> <meta charset=utf-8> <title>Dir test: Default</title> </head> <body> <p>In Arabic, the title <quote dir=rtllang=ar>نشاطالتدويل، W3C</quote> means <quote>Internationalization Activity, W3C</quote>.</p> </body> </html>

  22. “Ruby” – XHTML vs. HTML5 <ruby> <rb>日本</rb> <rt>にっぽん</rt> </ruby> <ruby> <rb>電</rb> <rt>でん</rt> </ruby> <ruby> <rb>気</rb> <rt>き</rt> </ruby> <ruby> 日本 <rt>にっぽん</rt> 電 <rt>でん</rt> 気 <rt>き</rt> </ruby

  23. “Domain” <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0"> <its:domainRule selector="/html/body" domainPointer="/html/head/meta[@name='keywords']/@content” domainMapping="automotive auto, medical medicine, 'criminal law' law, 'property law' law"/>/> </its:rules> Means • Express domain information about content of „body“ element • Domain information is in the „meta“ element • Optional mapping of source content domains, e.g. automotive > auto Purpose: not define a domain vocabulary, but pass domain information to application (MT system, MT training tool)

  24. “Storage Size” <!DOCTYPE html> <html lang=en> <head> <meta charset=utf-8> <title>Example</title> </head> <body> <p>String to translate:</p> <p contenteditable=true id=123 its-storage-size=25>Papua New-Guinea</p> <p contenteditable=true id=139 its-storage-size=25>Dominican Replubic</p> </body> </html>

  25. “Translate” in XML and HTML5 • ITS namespace vs. HTML5 native “translate” attribute <article...> ... <para> Youneed a new <span its:translate="no"> motherboard</span></para> ...</article> <!DOCTYPE html> <html>... <p>Youneed a new <span translate="no"> motherboard</span>...</p>... </html>

  26. “Terminology” in XML and HTML5 • ITS namespace vs. HTML5 its-* “term” attribute <article...> ... <para> Youneed a new <span its:term="yes"> motherboard</span></para> ...</article> <!DOCTYPE html> <html>... <p>Youneed a new <span its-term=yes> motherboard</span>...</p>... </html>

  27. “Quality” metadata in the browser <html>… <script id=its-standoff-1 type=application/xml> <its:locQualityIssuesxml:id="lq1"…> <its:locQualityIssue locQualityIssueType="misspelling" …/> <its:locQualityIssue locQualityIssueType="typographical” …/> </its:locQualityIssues> </script>…… <p> <span its-loc-quality-issues-ref=#lq1>c'es</span> le contenu</p> …</html> • See life demo at http://tinyurl.com/its2-lq-html5

  28. Rationale for its-* • HTML attributes are case insensitive; no qualified namespace • ITS 1.0/2.0 attributes use • camel case:its:locNote, its:termInfo, its:withinText, … • ITS namespace • Good news: conversion to HTML5 is straight forward • its-loc-note, its-term-info, its-within-text, …

  29. Effect on Localization Workflow translate, dir, its-locNote, its-termInfo, … : „interpretation“ likeits:translate, its:termInfo, ... HTML5 as XML HTML5 XLIFF-based Localization HTML5 as HTML XHTML5 HTML5 as HTML witherrors Transformation > XHTML5 > HTML5 parsing > HTML5 or XHTML5 HTML5 parsing > DOM creation > (XML serialization) > XLIFF generation

  30. Overview • HTML5 Serializations + Model • Localization Workflow with HTML5 • Metadata for (HTML5) Localization • What Else?

  31. Other HTML versions • “HTML legacy content”:no native supported for its-* • HTML validation tools will complain • Good news: its-* attributes “work” in older versions of HTML (e.g. 3.2 or 4.01), e.g. recognized by HTML DOM parser

  32. Tool support • its-* attributes in the pipeline for W3C HTML validator • Lot’s of XML+ITS / HTML5+ITS (partially) sensitive tools being developed in W3C MultilingualWeb-LT working group • HTML5 validation with ITS 2.0 metadata, XML tool chain, online MT system, translation package creation, simple MT, HTML-to-TMS roundtrip, CMS support (Drupal), quality check, browser based review, named entity annotation, … • *Very raw* details (but further links!) at http://tinyurl.com/its2-use-cases

  33. What’s missing? • ITS 2.0 localization focuses on HTML markup • Elements, attributes • Server side / client side scripting content not taken into account • JavaScript, PHP, … • Using ITS 2.0 in HTML5 with XLIFF: still many bits missing • But: moving forward this week 

  34. Overview again … • HTML5 Serializations + Model • Localization Workflow with HTML5 • Metadata for (HTML5) Localization • What Else?

  35. ありがとうございました。 Localization and HTML5: Technical Aspects Felix Sasaki DFKI / W3C Fellow

  36. Localization and HTML5: Potential Slides for “Challenges and Promises”

  37. What is HTML5? • DOM specification • Parsing algorithm to cover most of current (and future) Web content • A set of APIs • Part of HTML5 specification • Defined in separate documents Explanatory and other documents For markup authors, XML tool chains etc.

  38. HTML5 – Serializations + Model • Two serializations <!DOCTYPE html> <html> <head> <metacharset=utf-8> <title>Myexample</title> </head> <body>... </body> </html> <htmlxmlns= "http://www.w3.org/1999/xhtml"> <head> <metacharset="utf-8"/> <title>Myexample</title> </head> <body>... </body> </html>

  39. HTML5 – Serializations + Model • Two serializations: HTML5 vs. XHTML5 <!DOCTYPE html> <html> <head> <metacharset=utf-8> <title>Myexample</title> </head> <body>... </body> </html> <htmlxmlns= "http://www.w3.org/1999/xhtml"> <head> <metacharset="utf-8"/> <title>Myexample</title> </head> <body>... </body> </html>

  40. HTML5 – Serializations + Model • Two serializations: HTML5 vs. XHTML5 <!DOCTYPE html> <html> <head> <metacharset=utf-8> <title>Myexample</title> </head> <body>... </body> </html> <htmlxmlns= "http://www.w3.org/1999/xhtml"> <head> <metacharset="utf-8"/> <title>Myexample</title> </head> <body>... </body> </html> One Document Object Model (DOM) document.getElementsByTagName("meta")

  41. Rational • More than 90% of the Web is invalid • See browser “Opera” MAMA report • XHTML was revolution • HTML5 is evolution • Parsing algorithm for existing Web content • Two serializations as input • Detailed error handling • Ouput: one DOM

  42. HTML5: current state • Developed within • W3C: HTML5 to become a standard • WHATWG http://www.whatwg.org/ - HTML as a “living standard” • High pressure in W3C to wrap up • Rationale: “We need one stable version” • At the same time: “We need more features!” – e.g. • ITS 2.0 • HTML accessibility http://www.w3.org/WAI/PF/html-task-force

  43. Plan: HTML5 finalized by 2014 • Finish HTML5 specification in W3C by 2014 • Work closely with WHATWG and others on new features, for next version • Don’t try to get everything into HTML5! • Allow for extension specifications, e.g. ITS 2.0 • Moving forward at their own pace

  44. HTML5 time line 2012 2013 2014 2015 2016 ---------- ---------- ---------- ---------- ---------- HTML5.0 CR start ...CR, LC Rec ... ... HTML5.1 FPWD --- LC + CR ...CR Rec From http://dev.w3.org/html5/decision-policy/html5-2014-plan.html

  45. Challenge: many extensions • HTML+RDFa - RDFa WG • Web Intents - Web Apps WG / Device APIs WG • HTML Editing APIs - HTML Editing APIs CG • HTML Media Capture - Device APIs WG • Media Capture and Streams - Device APIs WG / WebRTC WG • Media Fragments URI - Media Fragments WG • Encrypted Media Extensions - HTML WG • Media Source Extensions - HTML WG • ... manyrelvaluespecifications registered atthe link type registry – Microformats

  46. Promises: many extensions • See last slide  • That also means:Easy of adding localization features to HTML5

  47. HTML5 and Localization Issues • Localization: Mostly covered by ITS 2.0 • Technical aspects: see presentation from Felix Sasaki on Tuesday • Important: get by-in by browser vendors • Awareness of ITS 2.0 • Fostering browser based implementations • Easy of adoption for web developers

  48. Metadata for (HTML5) Localization:ITS 2.0 • “Internationalization Tag Set” 2.0 • Set of disjoint metadata items (“data categories”) for XML and HTML5 • Translate, Localization Note, Terminology, Directionality, Ruby, Language Information, Elements Within Text, Domain, Locale Filter, Provenance, Text Analysis Annotation, External Resource, Target Pointer, Id Value, Preserve Space, Localization Quality Issue, Localization Quality Précis, MT Confidence, Allowed Characters, Storage Size

  49. Metadata for (HTML5) Localization:ITS 2.0 • “Internationalization Tag Set” 2.0 • Some items are part of HTML5 spec • Translate, Localization Note, Terminology, Directionality, Ruby, Language Information, Elements Within Text, Domain, Locale Filter, Provenance, Text Analysis Annotation, External Resource, Target Pointer, Id Value, Preserve Space, Localization Quality Issue, Localization Quality Précis, MT Confidence, Allowed Characters, Storage Size

  50. HTML5 and Internationalization Issues • Many things to do • Ruby • International layout (work done mostly via CSS3 modules) • Here: our most favorite i18n core issues

More Related