250 likes | 502 Views
Unicode 3.0.1. Mark Davis www.macchiato.com. New 3.0 Characters. Category V 2.1 V 3.0 Alphabetics, Symbols 6,511 10,236 CJK Ideographs 21,204 27,786 Hangul Syllables 11,172 11,172 Assigned characters 38,887 49,194 Unassigned code values 18,134 7,827
E N D
Unicode 3.0.1 Mark Davis www.macchiato.com
New 3.0 Characters CategoryV 2.1V 3.0 Alphabetics, Symbols 6,511 10,236 CJK Ideographs 21,204 27,786 Hangul Syllables11,17211,172 Assigned characters 38,887 49,194 Unassigned code values 18,134 7,827 Sync’ed with ISO/IEC 10646, 2nd edition Unicode 3.0
80 Syriac 192 Thaana 128 Sinhala 160 Myanmar 384 Ethiopic 96 Cherokee 640 U.C. Ab. Syl. 32 Ogham 96 Runic 128 Khmer 176 Mongolian 256 Braille 128 CJK Rad. Sup. 224 Kangxi Rad. 16 Ideo. Desc. 32 Bopomofo Ext. 6,582 CJK Ideo. A 1,168 Yi Syllables 64 Yi Radicals New 3.0 Blocks Unicode 3.0
Property Updates (1) • Bidirectional properties • Byte order mark • Capital letters with iota adscript • Case • Combining classes • Decompositions Unicode 3.0
Property Updates (2) • Identifier Syntax • Layout controls • Linebreak properties • East-Asian width properties • Misc. Characters: Figure Space, Tilde,… • Ligature Control • Unassigned Code Points Unicode 3.0
Conformance • Unicode Transformation Formats • UTF-16BE, UTF-16LE, UTF-16, UTF-8 • Unicode Bidirectional Behavior • Other normative character property values Clause numbering maintained! • Stability Policies • Clarification of noncharacters • Normalization Conformance Test Unicode 3.0
Unicode Standard Annexes (UAX) • Integral part of 3.0.1 Standard • UAX #09: BIDI • UAX #11: East Asian Width • UAX #13: Newline Guidelines • UAX #14: Line Breaking • UAX #15: Normalization • Included in any reference to version 3.0 or later Unicode 3.0
Unicode Technical Standards (UTS) • UTS #06: Compression • IANA name: SCSU • UTS #10: Collation • Note: defined over all Unicode code points • Values will be updated soon for better ordering Unicode 3.0
Technical Reports • UTR #07: Language Tags • UTR #16: UTF-EBCDIC • UTR #17: Character Encoding Model • UTR #18: Regular Expressions • UTR #19: UTF-32 • UTR #21: Case Mappings Unicode 3.0
Draft Technical Reports • UTR #20: Unicode in XML… • UTR #22: Character Mapping Tables • UTR #24: Script Names • Open for public comment Unicode 3.0
Unicode Character Database • More Documentation, More Data • UnicodeData Blocks • ArabicShaping Jamo • CompositionExclusions SpecialCasing • EastAsianWidth LineBreak • Unihan BidiMirroring • CaseFolding NormalizationTest Unicode 3.0
Website changes • New Look & Feel • New Navigation • Enhanced FAQ • Glossary • What is Unicode? • Where is my character? Unicode 3.0
Beyond 3.0 • Characters • CJK characters, symbols, music systems, ancient scripts, extra characters, etc. • First allocated surrogate pairs • Properties • essential for Unicode enablement Unicode 3.0
Unicode 3.0 • Major new version • Over 10,000 new characters • Enhanced character data for implementations • Reorganized text for better reference • The version for normalization • Unicode Character Database 3.0.0 • Available now! Unicode 3.0
Q & A Unicode 3.0
Backup Slides Unicode 3.0
ICU: Paid Advertisement • Open Source Unicode Enablement Library • ICU: C/C++ and Java Versions • IBM Public License • Friday, 10:00 Helena Shih • http://oss.software.ibm.com/icu Unicode 3.0
Enumerated Versions • Unicode 1.0.0, Unicode 1.0.1 • Unicode 1.1.0, Unicode 1.1.5 • Unicode 2.0.0 • Unicode 2.1.2, Unicode 2.1.5, Unicode 2.1.8, Unicode 2.1.9 • Unicode 3.0.0 • www.unicode.org Unicode 3.0
Joan Aliprand Julie Allen (editor) Joe Becker Mark Davis Asmus Freytag John Jenkins Mike Ksar Rick McGowan Lisa Moore Ken Whistler Editorial Committee Unicode 3.0
New Characters (2) CategoryV 2.1V 3.0 Private Use 6,400 6,400 Surrogates 2,048 2,048 Controls 65 65 Not Characters 2 2 Assigned code values 47,402 57,709 Unassigned code values 18,134 7,827 Unicode 3.0
Reference to Versions • Open repertoire, but backwards compatible • Characters only added, not removed • Two early exceptions: ISO sync. & Korean • Don’t overspecify the version: • “Version 2.1.0” vs.“Version 2.1” vs.“Version 2 or later” • Includes Technical Reports!! Unicode 3.0
Versions of the Standard • major - significant additions • published as a book • minor - character additions or more significant normative changes • published as a Technical Report • update - any other changes • on the website in /standard/versions/ • Example: 2.1.9 Unicode 3.0
Versioning Characters Properties Conformance Technical Reports Unicode Character Database Future Unicode 3.0 Unicode 3.0
Reorganized Text • 6: Punctuation • 7: European Alphabetics • 8: Middle Eastern • 9: South Asian • 10: East Asian • 11: Other (Mongolian, etc.) • 12: Symbols • 13: Formatting, Controls, Specials Unicode 3.0
Additionally • Shift-JIS Index • Full Radical Stroke Index • CJK split in several blocks • Improved Charts • Especially for CJK Ideographs • Improved Implementation Guidelines • General Clarifications Unicode 3.0