670 likes | 872 Views
Software Globalization With Windows 2000/XP. Houman Pournasseh Lead Program Manager. Agenda. Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency
E N D
SoftwareGlobalization With Windows 2000/XP Houman Pournasseh Lead Program Manager
Agenda • Definitions • Why invest in World-Ready products? • Globalization – step-by-step • Universal encoding - Unicode • Locale aware • Handle different input methods • Complex script aware • Font independency • Multi-lingual UI aware • Mirroring aware • Conclusion & References
Agenda • Definitions • Why invest in World-Ready products? • Globalization – step-by-step • Universal encoding - Unicode • Locale aware • Handle different input methods • Complex script aware • Font independency • Multi-lingual UI aware • Mirroring aware • Conclusion & References
Definitions • World-Ready:Properly globalized and localizable. • Globalization:The process of designing and implementing source code so that it can accommodate any local market (locale) or script. • Localizability:Designing software code and resources such that resources can be localized for any local market (locale) without changing the source code. • Localization:The process of adapting a product (including both text and non-text elements) to meet the language, cultural, and political expectations and/or requirements of a specific local market (locale).
To define their geographical location, users set the location To select a UI language, users set the UI language To run legacy applications (non-Unicode), users set the system locale To enter text in different languages, users set the input locale Users and Locales: To define formatting for date, time…,users set the user locale
New to Windows XP • Nine (9) new locales added to previous list of 126. • Punjabi, Gujarati, Telugu, Kannada, Kyrgyz, Mongolian (Cyrillic), Galician, Divehi, Syriac • New Indic and Arabic scripts • Gujarati, Gurmukhi, Telugu, Kannada, Syriac, Divehi • More robust font display for East Asian languages. • Improved Regional Settings options. • Largely improved MUI support • New location (GEO) • Support for GB18030
Agenda • Definitions • Why invest in World-Ready products? • Globalization – step-by-step • Universal encoding - Unicode • Locale aware • Handle different input methods • Complex script aware • Font independency • Multi-lingual UI aware • Mirroring aware • Conclusion & References
Why invest in World Ready products? • Get into international market (World Wide Web era) • Create a single functionality binary to: • Reduce development effort and cost • Ease support and maintenance pain • Sim-ship and avoid being your own competitor
Agenda • Definitions • Why invest in World-Ready products? • Globalization – step-by-step • Universal encoding - Unicode • Locale aware • Handle different input methods • Complex script aware • Font independency • Multi-lingual UI aware • Mirroring aware • Conclusion & References
Transforms of Unicode • UTF-7: 7 bit transformation format (rare) • UTF-8 • 8 bit transformation format • For transmission over unknown lines: e.g. Web pages • Codepage number CP_UTF8 = 65001 • UTF-16 and UCS-2 • Microsoft uses UTF-16 little-endian as its standard for Unicode encoding • UTF-32 and UCS-4
Windows 2000/XP:Unicode & Single Binary • Built in support for hundreds of languages • Any (well behaved) language Win32 application can run on any language version of Windows 2000/XP • Native Unicode support for new scripts • Support for supplementary characters
Unicode Encoding Non-Unicode applications behavior depends on user’s settings and makes data exchange between OS language versions impossible.
Legacy systems support • Few exceptions for not fully Unicode apps: • App has to run on Win9x and NT • Existing Internet protocols and standards require special encoding • Supporting apps that need to run on Win9x • Create two separate binaries: one ANSI & one Unicode • Register as ANSI and internally convert to/from Unicode as needed • Use MSLU!
Data types • For 8 bit and double-byte characters: typedef char CHAR; // 8 bit character typedef char *LPSTR; // pointer to 8 bit string • For Unicode (“Wide”) characters: typedef unsigned short WCHAR; // 16 bit character typedef WCHAR *LPWSTR; //pointer to 16 bit string LPTSTR TCHAR wchar_t char wchar_t * char *
Win32 API prototypes • Generic function prototypes:// winuser.h#ifdef UNICODE#define SetWindowText SetWindowTextW#else#define SetWindowText SetWindowTextA#endif // UNICODE • A routines behavior under Windows 2000/XP • W routines behavior under Win9x
Generic CRT 8 bit codepage Unicode _tcscpy strcpy wcscpy _tcscmp strcmp wcscmp Generic Win32 8 bit codepage Unicode lstrcpy lstrcpyA lstrcpyW lstrcmp lstrcmpA lstrcmpW String manipulation functions and macros Compile with –D_UNICODE to get Unicode version Compile with –DUNICODE to get Unicode version Text macro: #ifdef UNICODE#define TEXT(string) L#string #else#define TEXT(string) string#endif // UNICODE
Unicode ANSI • Converting between ANSI and Unicode • MultiByteToWideChar for codepage Unicode • WideCharToMultiByte for Unicode codepage CP can be any legal codepage number or a predefined such as: CP_ACP, CP_SYMBOL, CP_UTF8, etc. • Tips for writing Unicode: • Use generic data types and function prototypes • Replace p++/p-- with CharNext/CharPrev • Compute buffer sizes in TCHAR
Demo! Porting an ANSI application to Unicode
Encodings in Web pages • ANSI codepages or ISO character encodings • Mono-lingual or restricted to one script • Raw Unicode: UTF-16 • OK for Windows NT networks • Number entities: क • OK for occasional use • UTF-8: Recommended encoding • Supported by IE 4.0+ and Netscape 4.0+
Setting web encoding • HTML/DHTML: Tag in the head of the document <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=<value>"> • XML: <?xml version=“1.0” encoding=<value>?> • ASP: • Specify charset using ASP directives: • Per session: • <%Session.CodePage=<charset>%> • Per page: • <%@CODEPAGE=<charset>%>
Setting encodings for .NET • Class: System.Text • Distinction between: File, Request, and Response encodings • in code: Response.ContentEncoding=<value> • in page directive: <%@Page ResponseEncoding=<value>%> • in configuration file: <globalization requestEncoding=<value> responseEncoding=<value> fileEncoding=<value> />
Agenda • Definitions • Why invest in World-Ready products? • Globalization – step-by-step • Universal encoding - Unicode • Locale aware • Handle different input methods • Complex script aware • Font independency • Multi-lingual UI aware • Mirroring aware • Conclusion & References
Windows 2000/XP: NLS • NLS APIs allow you to automatically adjust to users formatting preferences: • Date: 07/04/01 is 平成 13年7月4日 in Japan • Time: 9:00PM is 21:00 in the France • Currency: $1,000.00 is 1.000,00 $ in Germany • Large Numbers: 123,456,789.00 is 12,34,56,789.00 in Hindi • Sort Order: German ä comes after a Swedish ä comes after z
Language ID 12 bits 4 bits 6 bits 6 bits 10 bits Sub-language PrimaryLanguage Reserved Sort ID Locale awareness • Eliminate implicit locale assumptions from code: #define ToUpper(ch) \ ((ch)<='Z' ? (ch) : (ch)+'A' - 'a') • Query system to format locale-dependent data using NLS APIs and LCIDs.
NLS APIs Getting and setting locales • Querying locales • LCID GetSystemDefaultLCID • EnumSystemLocales • LCID GetUserDefaultLCID() • LCID GetThreadLocale() • Setting locales • BOOL SetThreadLocale(LCID dwNewLocale) • BOOL SetLocaleInfo(LCID,…)// Works for standard locales only! • No APIs to set System locale, User locale, and UI language
NLS APIsQuerying locale information • To retrieve information specific to a given locale: GetLocaleInfo • Gives information for any valid locale (takes an LCID). • LCTYPE input tells type of info to retrieve for a given locale (e.g. currency symbol, name of months…). • Returns info in string buffer (LPTSTR). • To retrieve information specific to a location: GetGeoInfo • Gives information for any valid location (takes an LCID). • SYSGEOTYPE input tells type of info to retrieve for a given location(e.g. LCID, Time zones…).
NLS APIsFormatting data • To enumerate formats: • EnumCalendarInfo(Ex) • EnumDateFormats • EnumTimeFormats • To format data directly: • GetCurrencyFormat • GetDateFormat • GetTimeFormat
String comparison • A locale depending comparison: • lstrcmp or lstrcmpi • Locale independent comparison Win2000 & below: Locale = MAKELCID(MAKELANGID (LANG_ENGLISH, SUBLANG_ENGLISH_US), SORT_DEFAULT); ComapreString(Locale, ..., ..., ..., ...); Windows XP: CompareString(LOCALE_INVARIANT, …, …, …, …, …);
Demo! A locale aware application
Locales in web pages • Defaults to the user locale • Supported by IE4.x and Netscape 4.x • A server variable that can be retrieved by:Request.ServerVariables("HTTP_ACCEPT_LANGUAGE") • A property of the Navigator objectnavigator.UserLanguage
Locale awareness in web pages • To retrieve user locale: • A server variable: Request.ServerVariables("HTTP_ACCEPT_LANGUAGE") • A property of the navigator object: navigator.UserLanguage • To set a locale: • In DHTML: SetLocale("de") DateData = FormatDateTime(now(), vbShortDate) • In ASP: <% Session.LCID = 1041 %> <% Response.Write( FormatDateTime(dtNow) ) %>
Locale awareness in .NET • Class: System.Globalization • Referenced as CultureInfo – set of preferences based on language and culture. Pattern: xx-XX, such as fr-CA, de-AT (RFC-1766) • Setting the CultureInfo: • Implicit: Picked up from User Locale • Explicit: In code: Thread.CurrentThread.CurrentCulture = new CultureInfo (“de-DE”) In page directive: <%@Page Culture=<value>%> In config:<globalization culture=<value> />
Demo! Locale aware web site
Agenda • Definitions • Why invest in World-Ready products? • Globalization – step-by-step • Universal encoding - Unicode • Locale aware • Handle different input methods • Complex script aware • Font independency • Multi-lingual UI aware • Mirroring aware • Conclusion & References
Handling Input methods • Easiest: Using edit controls (recommended) • Responding directly to user input • Input locales (language + input method): HKL • GetKeyboardLayout • ActivateKeyboardLayout • LoadKeyboardLayout • Windows messages: • WM_INPUTLANGCHANGEREQUEST • WM_INPUTLANGCHANGE • WM_IME*.* (for IME support only) • WM_CHAR and WM_IME_CHAR
Agenda • Definitions • Why invest in World-Ready products? • Globalization – step-by-step • Universal encoding - Unicode • Locale aware • Handle different input methods • Complex script aware • Font independency • Multi-lingual UI aware • Mirroring aware • Conclusion & References
Windows 2000/XP: Complex Scripts • Complex Scripts have one or more of the following attributes: • Bi-directional (BiDi) reordering (Arabic, Hebrew) • Contextual shaping (Arabic, Indic family) • Display of combining characters (Arabic, Thai, Indic) • Specialized word-breaking (Thai) • Text Justification (Arabic)
Uniscribe • Clients: Windows 2000/XP, Trident, Microsoft Office 2000/XP • A collection of exported APIs (high and low level) • Hides implementation details • A shaping engine per language Application LPK.DLL USERGDI USP
Options to display text • Plain text in application • Standard edit control or • Win32 API (ExtTextOut / DrawText). • Simple formatted text • In Win32 apps, use Richedit control. • For Web pages, use Document Object Model (DHTML). • Advanced formatting • Use Uniscribe (see SDK and MSJ article).
Special considerations • When dealing with BiDi, set RTL reading order and alignment • SetTextAlign / GetTextAlign with TA_RIGHT • ExtTextOut with ETO_RTLREADING • DrawText with DT_RTLREADING • To measure line lengths: • Do not sum cached character widths • Do use a GetTextExtent function or Uniscribe • When displaying typed text: • Do not output characters one at a time! • Do save text in a buffer and display the whole string with Uniscribe or Win32 API
Agenda • Definitions • Why invest in World-Ready products? • Globalization – step-by-step • Universal encoding - Unicode • Locale aware • Handle different input methods • Complex script aware • Font independency • Multi-lingual UI aware • Mirroring aware • Conclusion & References
Windows 2000/XP:Font support • Introduction of OpenType fonts: • Extended TTF with glyphs for PE, ME, Thai, Greek, Turkish, Cyrillic… • Font fallback mechanism for CS and Eastern Asian scripts used by Uniscribe • Font linking mechanism used by GDI
Font independencyWin32 programming • Not to do: • Hard code font face names • Assume a given font is installed • Assume selected font supports the desired script • To do: • Use MS Shell Dlg face name in Dialog resources • EnumFontFamiliesEx or ChooseFont to select fonts
Font independencyIn Web pages • Avoid placing text formatting values into in-line style. <span style = "font-size: 10pt; font-family: Arial;"> Hello </span> • Declare text style in CSS files: <style> .myStyle {font-size: 10pt; font-family: Arial;} </style> <span class = myStyle> Hello </span> • Use WEFT to embed fonts to your web pages (IE only): http://www.microsoft.com/typography/web/default.htm
Agenda • Definitions • Why invest in World-Ready products? • Globalization – step-by-step • Universal encoding - Unicode • Locale aware • Handle different input methods • Complex script aware • Font independency • Multi-lingual UI aware • Mirroring aware • Conclusion & References