220 likes | 519 Views
Goals. Provide information on how Microsoft uses Unicode to support multi-languagesProvide programming technique on how to support Vietnamese using UnicodeProvide technical information on how to move from ANSI to UnicodeShow examples through demos and case-studies. Not covered in this talk. This
E N D
1. Vietnamese Support In Unicode Chu Vu~Development managerMicrosoft Office Complex Scripts
2. Goals Provide information on how Microsoft uses Unicode to support multi-languages
Provide programming technique on how to support Vietnamese using Unicode
Provide technical information on how to move from ANSI to Unicode
Show examples through demos and case-studies
3. Not covered in this talk This event does not provide technical information about Unicode
Talk in general about globalization
Enabling covered
Locale not covered
Localization not covered
4. Globalization User Interface
Windows, menus, dialogboxes, layout/mirroring, etc
Locale
Date/time/calendar, currency, paper size, etc
Application
Universal aware app, multi-language UI, language specifics (justification, type/replace, etc)
Input method
Simple keyboard layout, IME/telex, etc
Localization
Translation of UI, help, etc
Fonts
Bitmap, Vector, TrueType, OpenType, ClearType
5. Why move to Unicode ? Support multi-languages - Dont limit your application to one language
New languages will not have a codepage (Ex. Indic scripts such as Hindi, Tamil)
The question is not how much does it cost to move to Unicode ? but when
It took many years for Microsoft to move to Unicode with the introduction of Win NT
Future OS and applications are built as Unicode
6. Vietnamese support on Microsoft products Windows 2000 or later (based on NT platform)
Office 2000 SA; OfficeXP or later
IE 4.1 + Vietnamese language pack
Most Microsoft products use combining method for keyboard input and storage
Most Microsoft core fonts support both combining and pre-composed characters (ex. Arial, Courier New, Times New Roman, etc) CharMap applet
7. Unicode 16-bit international character encoding
Windows 2000 uses Unicode version 2.1
8. Vietnamese ranges in Unicode Vietnamese characters are random in Unicode table
0x0041-0x005A A-Z uppercase Latin
0x0061-0x007A a-z lowercase Latin
0x00C2..0x00F4 ,,..,, Latin extended
0x0300-0x0323 huy`n, ho?i, nga~, sac, na?ng
0x0102..0x01B0 A, a, .., U, u
0x1EA0-0x1EF9 A?, a?, .., Y~, y~
0x20AB d`ng
9. Advantage of Unicode Unicode makes multi-lingual computing possible.
Data sharing between platforms
Any language version of applications can run on any version of Win2000/XP
Non-Unicode applications behavior depends on the users settings.
10. Relatives to Unicode UTF-7
7 bit transformation format, seldom used
UTF-8
8 bit transformation format
For transmission over unknown lines: e.g. Web pages
Codepage number CP_UTF8 = 65001
UTF-16 or UCS-2:
The standard 16 bit Unicode
UTF-32 or UCS-4
11. Implementing Unicode apps Win32 APIs and C run time lib make it easy to port your existing apps to Unicode
Four possibilities (depending on your design):
Pure Unicode
Dual compile path
Support Unicode internally
Generic Unicode
12. Create a pure Unicode application
Advantage:
Easy to implement
Disadvantage:
Works only on Windows NT Option 1: pure Unicode
13. Create two binaries:
Default compile for Windows 9x
Compile with -DUNICODE and D_UNICODE for NT
Advantages:
Runs on both platforms
Easy to implement
Disadvantages:
Maintenance of two binaries is messy
No Unicode support on Windows 9x
Option 2: dual compile path
14. Always register as ANSI application, convert to/from Unicode as needed
Advantages:
Moderate engineering effort
Uses Unicode on Windows 9x and Windows NT
Disadvantages:
Does not support new scripts (Devanagari, Tamil, Armenian, Georgian), even when on NT
Multi-script support more difficult Option 3: support Unicode internally
15. Use Unicode everywhere with single binary, two code paths:
On Windows NT, use W entry points
On Windows 9x, convert Unicode ? ANSI, use A entry points
Advantages:
Full Unicode support when on Windows NT
Use Unicode uniformly in all Win32 apps
Disadvantages:
Substantial engineering effort Option 4: generic Unicode
16. Converting between ANSI & Unicode MultiByteToWideChar for codepage ? Unicode
WideCharToMultiByte for Unicode ? codepage
Codepage can be:
Any legal codepage number.
Predefined: CP_ACP, CP_SYMBOL, CP_UTF8, etc.
17. On Windows NT, the A routines are wrappers that:
Convert ANSI text to Unicode.
Call corresponding W routine.
Use system code page for conversion.
On Windows 9x
A routines are native.
Most W routines fail, SetLastError to ERROR_CALL_NOT_IMPLEMENTED.
On Windows 2000/XP, you can:
Set system locale to any supported value.
Reboot.
Emulate a localized Windows 9x system. Behavior of W and A routines in Win32
18. Generic text mapping routines: C run time extensions
19. Use dual path constants/macros
Generic data types and function prototypes
Explicit types LPBYTE for byte pointers
Replace p++/p
with CharNext/CharPrev
Compute String length in bytes by:
NumChars * sizeof(TCHAR)
Compile:
Default to ANSI application or
-DUNICODE and -D_UNICODE for Unicode Best practices
20. ANSI codepages or ISO character encodings
Disadvantage: Mono-lingual or restricted to one script
Raw Unicode: UTF-16
OK for intranet on Windows NT networks
May not work for Internet pages
Number entities: क
OK for occasional use such as inserting characters not in the main script of page
Not good for large amounts of multi-lingual text
UTF-8: Recommended encoding
Works just about everywhere
Supported by IE 4.0+ and Netscape 4.0+ Encoding options in Web pages
21. HTML/DHTML:
Tag in the head of the document
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=<value>">
XML:
<?xml version=1.0 encoding=<value>?>
ASP:
Defaults to system codepage (IIS 6.0 not Unicode!)
Specify charset using ASP directives:
Per session:
<%Session.CodePage=<charset>%>
Per page:
<%@CODEPAGE=<charset>%> To set encoding in Web pages
22. Universally encoded Web page
23. Resources General guidelines on internationalization:
http://www.microsoft.com/globaldev
Information on Unicode implementation:
http://www.microsoft.com/msj/0499/multilangUnicode/multilangUnicodetop.htm