480 likes | 722 Views
Supporting Complex Scripts (such as Arabic and Hebrew) in your Windows 2000™ Application F. Avery Bishop Senior Program Manager Microsoft Corporation. Agenda:. Overview of character encoding, Unicode Guidelines for supporting complex scripts Right-to-left layout of applications
E N D
Supporting Complex Scripts (such as Arabic and Hebrew) in your Windows 2000™ ApplicationF. Avery BishopSenior Program ManagerMicrosoft Corporation
Agenda: • Overview of character encoding, Unicode • Guidelines for supporting complex scripts • Right-to-left layout of applications • Multilingual User Interface
Why do character set differences matter? • Historically, they fragmented code bases for both Windows and applications • Single byte: European editions • Double byte: Far East editions • Bi-directional: Middle East editions • Make it difficult to share data • Make it difficult to develop multilingual applications
Example: Multiple Hebrew Character Encodings • 8bit Hebrew encodings still in use • Windows codepage 1255 • OEM (DOS) codepage 862 • Visual Hebrew encodings (many exist)
Example: Multiple Arabic Character Encodings • 8bit Arabic encodings supported in Internet Explorer 4.0/CS • ASMO-708 • DOS 720 • ISO 8859-6 • Windows Codepage 1256 • Other proprietary encodings
Logical vs Visual Encoding • Logical: • Storage order is same as typing order • Allows natural text processing: • Search • Resizing (e.g., in web pages) • IPC: Select, cut & paste • Visual • Natural text processing difficult or impossible • Cannot always map back to logical order
What is Unicode? • A 16-bit character encoding • A mapping of characters to numbers • Syntax rules for display of complex scripts • Not a font or glyph encoding! • Not a sort algorithm! • Includes all characters in common use in modern scripts (and others) • Basis for the ISO 10646 character encoding standard • Native text encoding for Windows NT
A 0041 9662 FF96 4F85 0000 0xFFFF Unicode™ / ISO 10646 Compatibility Private use Future use • 16-bit international character encoding • Windows 2000 uses Unicode version 2.0 Ideographs(Hanzi, Kanji, Hanja) Hangul Kana Symbols Punctuation Thai Indian Arabic, Hebrew Greek (null) Latin ASCII 0x0000
Relatives of Unicode • ISO/IEC 10646 • 32 bit ISO standard of 64K X 64K “planes” • Unicode repertoire is plane 0 • UTF-7 • 7 bit transformation format • Not widely used • UTF-8 • 8 bit transformation format • Used in web pages and some email
Unicode in Win32: the W and A Entry Points • Two kinds of window classes: Unicode, ANSI • Win32 API has two versions of most functions: • “W” (wide) version handles Unicode • “A” (ANSI – ) assumes the system default code page (character encoding)
Unicode in Win32 … • Macros resolve to W or A entry point • Example: Macro for RegisterClassEx #ifdef UNICODE #define RegisterClassEx RegisterClassExW #else #define RegisterClassEx RegisterClassExA #endif • To create Unicode application: • Compile with –DUNICODE or • Use W routines explicitly
For Applications that Must Also Run on Windows 98… • Use Unicode everywhere with single binary, two code paths: • On Windows NT use W entry points • On Windows 98, convert Unicode ANSI, use A entry points • See sample GLOBALDV for example • See April Microsoft Systems Journal for details and other options
Summary: Use Unicode if you can! • Represent all text with one unambiguous encoding • Support multilingual text easily • Avoid special processing for variable byte-length characters • Use standard encoding recognized throughout the industry and the world • Support new scripts that are only supported through Unicode
1. Displaying Complex Scripts in Plain-text • In Win32 apps use standard edit control • Use standard win32 API display functions • Win32 APIs: ExtTextOutW or DrawTextW • ScriptString API in Uniscribe
Pitfalls in Enabling for Complex Scripts • When displaying typed text: • Do not output characters one by one! • Do save text in a buffer and display the whole string with Uniscribe or Win32 API • To measure line lengths: • Do not sum cached character widths • Do use a GetTextExtent function or Uniscribe
2. Displaying Complex Scripts in Simple Formatted Text • In Win32 applications use rich edit control • In web pages for Internet Explorer 5.0, use Document Object Model
3. Displaying CS in Text with Advanced Formatting and Layout • Use script APIs (“Uniscribe”) • See MSJ article of November 1998
Overview of Uniscribe Background and Purpose of Uniscribe Low level APIs High level APIs For details see November 1998 MSJ article
The Uniscribe DLL: USP10.DLL • Platforms • Windows 2000 • Windows NT 4 • Windows 98 • Windows 95 (excluding Far East) • Single worldwide binary • Installs with Windows2000, IE5, Office 2000
Hides language details • Syllable structure (Indian, Thai) • Contextual shaping (Arabic, Indic) • Caret placement (all) • Wordbreak (Thai) • National digits (Arabic, Indic, Thai) • Bidirectional layout (Arabic, Hebrew)
Hides Unicode OS details • APIs are Unicode on all platforms • Hides glyph codes • Hides font differences • Shaping tables • Fixed repertoire fonts
Uniscribe Structure Uniscribe Client GDI Itemize Unicode BiDi algorithm Measurer Arabic shaping engine Renderer GetCharABC - Hindi shaping engine CMAP & WidthsI width Shape, Tamil shaping engine GetGlyphOutline Place tables, Thai shaping engine Open - and Vietnamese shaping TextOut Type ExtTextOut library Display Hebrew engine ETO_ GLYPH_INDEX Layout Justify Caret Mouse XtoCP & CPtoX
Shaping engines • Per script • Understand language rules • Understand font features • OpenType provides full control • Many older fixed layout fonts
Application USER GDI LPK.DLL Uni-scribe
Low level APIs Support • Formatting text • Style runs • Measurement • Paragraph filling • Rendering • Information needed for font fallback
Summary • Script… • Itemize • Shape, Place • Break, Layout • TextOut • CPtoX, XtoCP
High level APIS • Purpose • Analysis • Display • Font fallback
Purpose For Windows 2000 ExtTextOut DrawText System edit control Cross-platform Unicode plaintext display Easier than low level APIs
Summary of ScriptString APIs: • ScriptString… • Analyse • … query analysis ... • Out • Free • Provides simple font fallback
Background On RTL Layout (“Mirroring”) For BiDi Localization • Localized Arabic and Hebrew Windows® is laid out from Right to Left • In the past was done “ad hoc” or not at all • Windows 2000 and BiDi Windows 98 include mechanisms to “automatically” mirror shell and applications • Also helpful for multilingual user interface support
Mirroring in System Based on Coordinate Transformation • Origin (0,0) in upper RIGHT corner of window • X scale factor = -1, x values increase from right to left Origin Origin Increasingx Increasingx 0 1 1 0 Default (LTR) Window Mirrored (RTL) Window
More Background on Mirroring… • Developers use programming interfaces and Windows style bits • Automatic inheritance of RTL property: • Child window of RTL window defaults to RTL • You can disable inheritance of RTL Property • APIs provided to disable mirroring of bitmaps
Implementing Mirroring in Win32 Applications:Standard Windows • Use SetProcessDefaultLayout: • Affects all Windows created thereafter • SetProcessDefaultLayout(LAYOUTRTL) ; • SetProcessDefaultLayout(0) ; // Reset to LTR • Or call CreateWindowEx: • Use extended style WS_EX_LAYOUTRTL • To inhibit mirroring in child windows, also set WS_EX_NOINHERITLAYOUT
Changing Layout of Existing Window BOOL IsRTLLayout ; // TRUE iff window is to be mirrored // ... Get new value of IsRTLLayout LONG lExStyles = GetWindowLongA(hWnd, GWL_EXSTYLE) ; // Check whether new layout is opposite current layout if(!!(IsRTLLayout) != !!(lExStyles & WS_EX_LAYOUTRTL)){ lExStyles ^= WS_EX_LAYOUTRTL ; // Toggle layout // Set extended styles to new value SetWindowLongA(hWnd, GWL_EXSTYLE, lExStyles) ; // Update client area InvalidateRect(hWnd, NULL, TRUE) ; }
Controlling Mirroring of a Device Context • SetLayout(HDC hDc, DWORD dwLayout) dwLayout = 0 ; // will layout LTR dwLayout = LAYOUTRTL ;// will layout RLT dwLayout = LAYOUTRTL | LAYOUT_BITMAPORIENTATIONPRESERVED ; // will layout RTL, but not bitmaps • GetLayout(HDC hDc, DWORD *pdwLayout)Tells what the layout settings are for a hDc
Mirroring in Win32 Applications: Dialogs • Set WS_EX_LAYOUTRTL in dialog template • Visual Studio 6 Dialog editor: • Has option for RTL layout • BUG in Visual Studio 6: • Writes WS_EX_LAYOUT_RTL to RC file! • Must correct RC file by hand to compile • Will be fixed in future version
Mirroring in Win32 Applications: Message Boxes • Set MB_RTLLAYOUT option bit
Guidelines for using RTL Layout • Using coordinates • Use GetWindowRect with care • Use client, rather than screen coordinates • Do not mix screen coordinates and client coordinates • Use MapWindowPoints to map rectangles, instead of ClientToScreen and ScreenToClient • Windows 95 does not support mirroring!
Guidelines for Multilanguage User Interface • Initialize to current UI language • Windows 2000: GetUserDefaultUILanguage() • Others: Use the language of the O/S • See function InitUiLang in Globaldev sample code
Guidelines for Multilanguage User Interface • Allow user to select UI language • Put language-dependent resources in resource DLLs • Use naming convention, e.g., res<LANGID>.dll • Find all resource DLLs, put up list box of choices • See module UPDTLANG.CPP in Globaldev Sample
Summary • Use Unicode to encode if you can • Use controls to display text and accept user input • Use Uniscribe for advanced formatting • Use new RTL layout API for applications localized to RTL languages • Consider multilingual user interface
Further Information and Resources • http://www.microsoft.com/globaldev(Watch for updates!) • MSJ articles, e.g., • Uniscribe: http://www.microsoft.com/msj/1198/multilang/multilangtop.htm • Multilingual UI: http://www.microsoft.com/msj/0499/multilangUnicode/multilangUnicodetop.htm • Send suggestions to nlshelp@microsoft.com