1 / 19

Localization Data

Localization Data. Mark Davis, PhD Chief SW Globalization Arch., IBM President, Unicode Consortium. Importance of Standards. Products developed in each country interoperate with other products: inside and outside that country

trent
Download Presentation

Localization Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Localization Data Mark Davis, PhD Chief SW Globalization Arch., IBM President, Unicode Consortium International Summit on Localisation (MAIT/TDIL)New Delhi, 2004-12-08 (R2)

  2. Importance of Standards • Products developed in each country interoperate with other products: inside and outside that country • Mechanism for countries / industries to promulgate best practices • SW Localization • Unicode: Universal character encoding • CLDR: Common Locale Data Repository

  3. Universal Character Encoding • Unicode: Unique character codes for all languages …

  4. Common Locale Data Repository • Relatively new project: 2004 • Hosted by Unicode Consortium • http://www.unicode.org/cldr/ • Goals: • Common, required SW locale data for world languages • XML format for effective interchange • Freely available

  5. What is Locale Data • Locale = identifier string referring to linguistic and cultural preferences • Typical data • Dates/time formats • Number/Currency formats • Measurement System • Collation Specification (Collation) • Used for sorting, searching, matching • Translated names for language, territory, script, timezones, currencies,…

  6. Latest Release: CLDR 1.2 • Released: November, 2004 locales languages territories • Approved: 232 72 108 • Draft: 63 27 28 • Data • Unique XPaths: 2,540 • Actual Values: 56,290 • Fully Resolved: 358,860 (not including collation, aliased data)

  7. Next Release: CLDR 1.3 • Jan 2005: Freeze date • For new enhancement requests & bug reports • Apr 2005: Target release date • Planned features • New data / corrections / tests (ongoing) • Survey tool • POSIX conversion tool • Additional Mechanisms • lenient date/time/number parsing; • different combinations of date fields; • names for dialects, measurement systems; • narrative reference information

  8. Usage (direct or indirect) • Caveats • Not a complete list: usage is not tracked, so this is an estimate • CLDR first available in 2004, so may use precursor data • Companies / Organizations • Adobe, Apple (Mac OS X), abas Software, Argonne National Laboratory, Ascential Software, Avaya, BEA, BroadJump, BluePhoenix Solutions, BMC Software (Remedy), Business Objects, caris, CERN, Cognos, Debian Linux, Gentoo Linux, HP, Hyperion, IBM, Inktomi, Innodata Isogen, Informatica, Intel, Interlogics, IONA, IXOS, JD Edwards, Jikes, Macromedia, Mathworks, Mozilla, OpenOffice, Language Analysis Systems, Lawson Software, Leica Geosystems GIS & Mapping LLC, Mandrake Linux, Parrot, PayPal, Progress Software, Python, QNX, Rogue Wave, SAP, Siebel, SIL, SPSS, Software AG, Sun Microsystems (Solaris, Java), SuSE, Sybase, Teradata (NCR), Trend Micro, Virage, webMethods, Wine, WMS Gaming,… • Optional use: • Apache, Perl, Xalan, Xerces, …

  9. Sample: Languages, Scripts, Territories <localeDisplayNames> <languages> <language type="aa">Afar</language> <language type="ab">Abkhasisk</language>… <scripts> <script type="Arab">Arabisk</script>… <territories> <territory type="AD">Andorra</territory> <territory type="AE">Forenede Arabiske Emirater </territory>…

  10. Sample: Characters / Dates <characters> <exemplarCharacters>[a-z æ å ø á é í ó ú ý] </exemplarCharacters> </characters>… <dayContext type="format"> <dayWidth type="abbreviated"> <day type="sun">søn</day> <day type="mon">man</day>…

  11. Sample: Timezones / Currencies <timeZoneNames> <zone type="America/Los_Angeles"> <long> <standard>Pacific-normaltid</standard> <daylight>Pacific-sommertid</daylight> </long>… <currencies> <currency type="GAF"> <displayName>Gabonesisk CFA-franc </displayName> <symbol>GAF</symbol>…

  12. Sample: Collation <collation type="standard" > <settings normalization="on" /> <rules> <reset before="primary">0</reset> <pc>ॐ।॥॰</pc> <reset>ह</reset> <pc> ़ँंः॒॑॓॔</pc> <reset>ऽ</reset> <p> ्</p>

  13. Committee Process • For most effective participation from people around the world • Meetings • By phone, never F2F • Short, often • Allows preparation between meetings • Written • Email • Database submissions

  14. Vetting Process for Data • Collect from different platforms, experts, submissions: new or revised • References to external sources strongly encouraged • Must be before freeze date for release • Will use Survey Tool • Enter in the repository • Mark with draft attribute • Add references, standards • Verify by CLDR committee members • Consulting with country contacts • If disagreement, decide in committee • Accept • As main form: draft attribute removed • As alternate form: marked with different attributes

  15. Challenges • Aggressive, 6 month release schedule • Complex Formats • Collation, Date Formats, Exemplar characters, etc. • Require close interaction of CLDR experts with language experts • Choosing most customary, acceptable forms • Regional differences, individual preferences • Context (months in formats vs. calendars) • Uncommon cases (“Interlingua”) • Standards vs. common modern usage • Obtaining references for data • But can have multiple, alternate versions

  16. Getting Involved • Simplest • Bug report / feature request – anyone! • More Involved • Vetting, Assessment, Tools, Policies, Decisions, … • Any Unicode member eligible to name representatives • Full members: IBM, Apple, Sun, Oracle, India,… • Liaison members: Ireland, Finland, … • Associate members: Tamil Nadu, …

  17. Example Country Process (Finland) • Finnish Ministry of Education made CLDR data a major goal, 2004-06 • Research Institute for the Languages of Finland ("RILF" aka "Kotus") designated agency • Documenting the national preferences in the open even more important than implementations • Results expected to lead to new/revised national standards

  18. Example Country Process (II) • RILF a Unicode Liaison member, 2004-07 • Set up fully open national group on language and cultural requirements on ICT, 2004-09 • Two official languages (Finnish and Swedish) & four regional / minority languages (three Sámi & Romani as spoken in Finland) to be covered • Over 30 different parties represented: commercial, non-commercial, individuals • Public comments to be allowed: http://kotoistus.fi • Documentation for all controversial issues and deviations from any national standards

  19. For more information • Unicode • http://www.unicode.org/ • CLDR • http://www.unicode.org/cldr/ • This presentation • http://www.macchiato.com/slides/Localization.ppt

More Related