180 likes | 192 Views
Explore XML internationalization, character encoding, language identification, content rendering, and more. Learn about XML Localization Interchange, Unicode deployment, and best practices for global content management.
E N D
XML Content Localization and Unicode 21 IUC Dublin, Ireland, May 2002 Ultan Ó Broin (ultan.obroin@oracle.com) Globalization Analyst Oracle Applications Technology Group
Agenda • XML Internationalization • Character set encoding and representation • Language identification • Content presentation and rendering • Language features • XML Localization • Defining the content • XML Localisation Interchange File Format • Summary
XMLContent • “Content, Content, All is Content” (Apologies to Ecclesiastes 1:2 and Lady Margaret Thatcher) • Software user interface strings, help, documentation text, marketing collateral, procurement catalogs … • Any data stored in a database or content management system
XML and Character Set Encoding • Any character set • Encoding Declaration • IANA values • Unicode for global deployment • UTF-8 Default • UTF-8 versus UTF-16 • UTF-16 requires0xFEFF/0xFFFEByte Order Mark
XML Character Representation • Unicode character representation • Numeric character references • TM =™ • Warning about character entities • TM =™ • Use Unicode normalized characters
Language Expression • Language identifier <xml:lang> attribute • Declaration <!ATTLIST pxml:lang NMTOKEN> • Language and country values • Warning about multilingual documents
Presentation and Rendering • eXtensible Style Sheet Language (XSL-FO) • Cascading Style Sheet (CSS) • International presentation • Fonts • Quotation marks • Lists • Eliminate conflict with Unicode markup • UTR#20 “Unicode in XML and other Markup Languages”
Presentation and Rendering • Bi-directional language support
Presentation and Rendering • Language support • Ruby text <h3>Example Ruby text (albeit in English)</h3> <p> <ruby> <rb>This is the Base Language Text Position</rb> <rt>This is the Ruby Language Text Position</rt> </ruby> </p>
Presentation and Rendering • Vertical writing writing-mode properties: <p style="writing-mode: tb;">Example of vertical text</p> • Combined text XSL and CSS text-combine properties: span.kumimoji { text-combine: letters; }span.warichu { text-combine: lines; } • White space delimiters xml:space element • Emphasis font-emphasis-style and font-emphasis-position properties • Different browsers and operating systems
Presentation and Rendering • Sorting • <xsl:sort/>element • Ascending and descending order • langattribute • Caution • Numbers • <xsl:number/> • Date and Time • Locale independent • XML/ISO Schema dateand time of day values
Localization of XML • Single content format : many media • Authors define data for Localization • Provide DTD or schema definition to the Localization Group
Localization of XML Content • Define information • Localization-friendly element names • Persistent Identifier • Context • Expansion • Localization notes • Non-localizable element names and attributes
Localization of XML Content • XML Localisation Interchange File Format (XLIFF) • Oracle, Novell, IBM/Lotus, Sun Microsystems, Alchemy, Berlitz, LionBridge, Moravia-IT, and the RWS Group • Requires XML conversion (XSLT, other) • Open standard DTD • Designed for the localization process • Localization tools support • SDLx, Trados Tag Editor, Star Transit, Alchemy Catalyst, ForeignDesk or any tool that defines localizable XML elements
XLIFF Example <header> <phase-group> <phase phase-name="translationedit" process-name="translation" date="2002-01-12T 12:11:21Z" /> </phase-group> </header>
XLIFF Example <trans-unit id=”bigirishcolumn_145” restype=”title” maxwidth=”90” size-unit=”byte”> <source xml:lang=”EN”>Database manager</source><target xml:lang=”GA”>Feighlí feasa</target> <alt-trans> <target xml:lang=”GA”>Gocamán na ngiotán</target> </alt-trans> <note>The Term Manager means administration tool - not a person</note> </trans-unit>
Summary XML and Unicode • Unicode for all content • Global storage • XLIFF for suppliers and vendors • One localization tool set • Globalization as a commodity • “XLIFF provides for the separation of content and process. It allows a focus on automation, stops a proliferation of internal XML formats, and turns localization into a commodity for all players. Software publishers focus on producing international products and vendors focus on localizing this content without managing multiple translation tools or file formats.”Paul Quigley, i18n consultant (paul_quigley_ie@hotmail.com)
References • XML Specifications: http://www.w3.org/XML/ • XLIFF: http://www.oasis-open.org/ • Tools, templates and more: http://www.opentag.com • XML Internationalization and Localization by Yves Savourel (ISBN:0-672-32096-7, Jul-2001) • Localization Institute Seminars on XML i18n and l10n: http://www.localizationinstitute.com