1 / 20

Unicode

Unicode. Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect 2003-09-24. Unique number for every character. Universal Character Encoding. …. Unifies all Languages. 96 thousand characters, so far All characters accessible at the same time, in the same document:

acoleman
Download Presentation

Unicode

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect 2003-09-24

  2. Unique number for every character Universal Character Encoding …

  3. Unifies all Languages • 96 thousand characters, so far • All characters accessible at the same time, in the same document: A, Ž, Ш, Δ, ش, क, க, ಔ,… か, 上, 각, …..

  4. Lingua Franca for Computers • Developed & supported by industry leaders: • Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys, … • Required by modern standards: • XML, HTML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, Perl, etc. • Implemented in: • All modern operating systems, browsers, and other products

  5. International Domain Names • Approved - Unicode-Based • Examples: • http://Юникод.com • http://Βαλκανίων.com • http://हमसब.com

  6. Standard Resources • www.unicode.org • Online Standard • Technical Reports • FAQs • General Information • Discussion Forums, Conferences

  7. Programming Resources • System APIs: • Windows, Java, Unix, Oracle, DB2, Sybase, Mac, Linux, … • Languages • Java, JavaScript, C#, Perl 5.6.0, C, C++, SQL, … • Cross-platform libraries: • ICU, Rosette, …

  8. Stability • Developers / other standards need absolute stability • Characters are never moved or deleted • Ordering of characters is by collation, not binary order. See UTS #10: Unicode Collation Algorithm • Characters may be deprecated (discouraged). • Characters never change names • Annotations are used to clarify usage • See Unicode Policies

  9. Indic Support in Unicode • ISCII the basis for characters and allocation • Consortium actively engaged with Indian Government, which is a member • Welcomes addition of missing characters (e.g. Vedic), clarifications or corrections of usage

  10. Structural Similarities with ISCII • Within script, layout and contents nearly identical • Independent + dependent vowels • Halant model for representing conjuncts • conjuncts / half-forms not directly encoded • represented by sequences instead • Phonetic sequence – order in syllables

  11. Structural Differences with ISCII • Unicode is stateless: • No shifting to get different scripts • Each character has a unique number • Unicode is uniform: • No extension bytes necessary • All characters coded in the same space

  12. Additional Characters • Indian Government is developing proposals for: • Additions of missing characters: • Vedic • Individual characters for certain scripts • Annotations and Descriptions

  13. Global Applications now support languages of India • Companies supporting Indic with Unicode • OpenType fonts • Font support for Indic • Microsoft Windows • Java (IBM contributed ICU Indic Layout) • Linux • …

  14. Benefits for India • All documents, anywhere in the world, can have Indic text • Allows seamless multilingual documents in India • including scriptures and minority languages • Opens up software export market, beyond English • Connects India to the world

  15. How India Can Contribute • Effective Communication with the Unicode Consortium • Provide Resources for Development • Descriptions of Usage • Descriptions of Character Shaping • Transliteration Tables from Script to Script • Collation Information • OpenType fonts • …

  16. What Developers Can Do • Interwork with existing ISCII systems • Move to Unicode for future developments • Java, Windows, Linux, …

  17. The Future • The world is moving rapidly to Unicode • Unicode makes India open to the world • The world comes to you, and • You go to the world • You can help

  18. Q & A

  19. Backup Slides

  20. Multiple Forms • UTF-8: maximal compatibility with 8-bit systems • UTF-16: good storage, interoperability with Windows/Java • UTF-32: simplest processing • Fast, lossless conversion • See Forms of Unicode

More Related