160 likes | 253 Views
Worldwide typography (and how to apply JIS-X-4051-1995 to Unicode). Michel Suignard Microsoft Corporation. Objectives. Worldwide single binary Multilingual DTP level on all writing systems Line breaking Font selection word breaking line justification. Challenges.
E N D
Worldwide typography(and how to apply JIS-X-4051-1995 to Unicode) Michel SuignardMicrosoft Corporation
Objectives • Worldwide single binary • Multilingual • DTP level on all writing systems • Line breaking • Font selection • word breaking • line justification
Challenges • Asian typography is not as well known as Western typography • Conflicting requirements • Vertical versus horizontal layout • Latin word wrap off • Ideographic word wrap on • Size of the Unicode repertoire (35K and growing)
JIS-X-4051 • First published in March 1993 • Does not address Unicode repertoire • Limited description of character classification • 2nd edition in October 1995 • Based on JIS-X- 221 (ISO 10646-1) • More detailed Character classification (20 classes) • Covers Line Breaking, Line composition rules, Ruby positioning, Horizontal in Vertical,…
Issues with JIS-X-4051 • Still a subset of Unicode • Character class contents are overlapping,(relying on contextual information not available to General Purpose software) • Single behavior class • Half/Full width characters not covered (user-defined) • Not aligned with most font design(Narrow versus Wide symbols) • Lack some useful features (like line break analysis across white space)
Character classification • Unicode space decomposed in Partitions (set of character ranges) • Each partition share a common behavior across all covered typographic rules • Partitions are mapped to classes specific to each rules (e.g. line breaking, font selection, etc…)
Typical usage After behavior class Before behavior class
Line breaking 何語を話しますか。「私は英語を話します。」 何語を話しますか。「私は英語を話します。」 • Kinsoku rules, to avoid this: or • Stricter rules for small kana (like in フェ) • Keep numeric expressions together, including postfix and prefix symbols • Allows French typography rules (no break between last word and ‘:;?!’, even if separated by a space character) • Disable Latin word wrap • Keep ideographic characters together
Line breaking classes Partitions mapped into 15 classes: • 1. Opening characters • 2. Closing characters • 3. No start ideographic • 4. Exclamation/interrogation • 5. Inseparable • 6. Prefix • 7. Postfix • 8. Ideographic • 9. Numeral sequence • 10. Alpha space • 11. Alpha characters/symbols • 12. Glue Characters • 13. Slash • 14. Quotation characters • 15. Numeric separators
Width modification and auto-spacing • Width Modification (contextual kerning):( (text) )becomes((text)) • Auto-spacing (add space between ideographic text and Western or numeric text)漢字western text漢字becomes:漢字 western text 漢字
Font selection scenario A new font is applied to a large multilingual selection of text. あの映画は日本の映画ですか。Is that movie a Japanese movie? ええ、そうです。Yes, it is. Assume we want to change the font of the English text, but still selecting the whole text: And we apply the ‘Haettenschweiler’ font to it, it is desirable to only affect the Latin text. あの映画は日本の映画ですか。Is that movie a Japanese movie?ええ、そうです。Yes, it is. It is similar situation when we want to apply an Asian face to the Japanese text (like HG) あの映画は日本の映画ですか。Is that movie a Japanese movie? ええ、そうです。Yes, it is. あの映画は日本の映画ですか。Is that movie a Japanese movie? ええ、そうです。Yes, it is.
Font selection based on character code point and context • Because there are no global Unicode fonts(fonts usually covers a group of writing systems) • Language is an important context selector to determine appropriate font(CJK context, ASCII symbols, Narrow versus Wide Greek and Cyrillic characters) • Some writing systems require several glyphs per characters and are better handled by having specialized fonts(Arabic, Hindi) • A large number of punctuation are shared among writing systems with non shareable typeface (e.g. Period ‘.’ between Latin and Armenian)
Ruby overhanging • Commonly used name to describe the association of pronunciation characters associated with base characters. • The Ruby sequence may be allowed to overhang on top of preceding or following the base characters as long as it doesn’t introduce confusion. • The classification allows to determine in which manner characters can be overhung: • No overhanging (e.g. CJK Ideographs), • Allowed only Before (e.g. Open quotes) • Allowed only After (e.g. Close quotes) • Allowed in both case (e.g. Hiragana)
Conclusion / Findings • A detailed analysis of the Unicode repertoire along common behavior is a powerful tool to construct sophisticated typographical effects. • Typographic complexity should be expressed as much as possible in tables and properties, not in code. • Many behaviors are correlated, allowing the usage of a limited number of Unicode partitions for many behavior descriptions.