910 likes | 1.13k Views
Windows™ 2000 Indian Language Developers Conference. F. Avery Bishop Senior Program Manager for Multilingual Developer Communications, and David C. Brown Development Lead for Complex Script Enabling in Windows™ Operating Systems Microsoft™ Corporation. Agenda for the Day .
E N D
Windows™ 2000 Indian Language Developers Conference F. Avery Bishop Senior Program Manager for Multilingual Developer Communications, and David C. Brown Development Lead for Complex Script Enabling in Windows™ Operating Systems Microsoft™ Corporation
Agenda for the Day • Welcome and Keynote • International Features of Windows 2000 • Complex Script Processing in Windows 2000 • Uniscribe: The Unicode Script Processor • Lunch • Guidelines for supporting complex scripts in Win32 applications • Supporting Indian text in Enterprise applications • Introduction to Open Type Fonts • Microsoft developer programs in India
Updates on Session Materials • Today’s presentations vary slightly from your session handouts • For updates to ppt files and demos, seewww.microsoft.com/globaldev
International Features in Microsoft Windows 2000™F. Avery BishopSenior Program ManagerMicrosoft Corporation
Agenda: International Features of Windows 2000* • Definitions of key concepts • Windows 2000 single-binary internationalization • Multilingual content • Windows 2000 Multilanguage version • New complex script support, including: • Support for Indian languages • Complex Scripts in web pages • Right-to-left layout of shell, applications *Old name: Windows NT 5.0
Definitions • Script:A set of symbols used to write one or more languages • Locale: • A place or locality (Dictionary definition) • Set of user preferences related to language and local customs • Language Group:Term used to describe the supported script families in Windows NT 5
Definitions • System Locale:Not really a locale. Determines which script non-Unicode applications will support (e.g., what Windows 9x system Windows NT emulates) • User Locale:User preferences for formatting of dates, currencies, numbers, etc. • Input Locale:Pairing of input language and method of of input; determines what language is currently being entered and how
Definitions • Enabling for a script:Adding support for input, display, and output of the script • Localization:Translating user interface elements • Globalization:Developing software such that feature design and code design are not limited to a single locale or script
Definitions • Complex Scripts:Scripts that require contextual processing for display, editing, and other processing
All language versions of Windows 2000 use the same core binary files!So What? • Advantages to Users: • Can enter text in any supported language on any version of Windows 2000 • Any language version of well written Win32 app runs on any language version of Windows 2000 • Advantages to developers: • Develop all language versions on one system • Can develop and ship a single binary for all languages
More on Unified language Support in Windows 2000 • Effect of system default locale on application: • ANSI applications require appropriate system locale setting • ANSI/Unicode applications may require system locale setting (more on this later) • Pure Unicode applications work with any system locale • Native Unicode support: • Important: New scripts will have no codepage, the support is through Unicode only (e.g., Indian scripts, Armenian, Gregorian)
Unicode allows processing of Multilingual Content • System components: • Internet Explorer 5.0 can do amazing things! • Others: Winlogon, File system, Notepad, etc. • Unicode applications • Office 2000 • Your application!
Windows 2000 Multilanguage Version • Language of menus and dialogs is a per-user-setting • Installable language modules • Sold through MOLP, Select, and Enterprise Agreement • Available to developers through MSDN
Support for Complex Scripts in Windows 2000 • Bi-directional (BiDi) reordering (Arabic, Hebrew) • Contextual shaping (Arabic, Indic family) • Display of combining characters (Arabic, Thai, Indian) • Specialized word-break and justification rules (Thai) • Disallowing illegal character combinations (Indian,Thai) A complex script is one that requires special processing, such as:
Right-to-Left Mirroring API • One function call will “mirror” all windows in an application • Can also mirror selective windows • APIs to suppress mirroring of bitmaps • May need to modify coding practices
Support for Indian Languages in Windows 2000 • APIs handle Devanagari and Tamil text through Unicode • Locale support • Time, Date, number, currency formats • Sorting • Conversion • Explicit function calls convert to/from ISCII • No Windows 98 compatibility mode
How We Developed Indian Script Support in Windows 2000 • Worked with Government organizations • Consulted with NCST, CDAC, academics • Brought engineers from NCST • Added Indian shaping engines to Uniscribe • Helped define feature tables for Open Type • Hired Hindi/Tamil speakers to test
Complex Scripts in Web pages • IE 5.0 supports complex scripts, including Devanagari and Tamil in: • Standard HTML text • DHTML – All properties in DOM • XML • Recommended encoding is UTF-8 • Place charset=utf-8 in HTTP header • Allows mixed scripts
Further Information and Resources • http://www.microsoft.com/globaldev(Watch for updates!) • MSJ articles, e.g., • Uniscribe: http://www.microsoft.com/msj/1198/multilang/multilangtop.htm • Multilingual UI: Coming April 1999 • Send suggestions to nlshelp@microsoft.com
Complex Script Processing in Microsoft Windows 2000™David BrownDevelopment LeadMicrosoft Corporation
Agenda • Overview • Implementation • Details
1. Overview Distinct language groups Mix any and all scripts Most apps are easy to develop CS = Complex Script
Complex Script Language groups Arabic, Hebrew, Indic, Thai, Vietnamese Part of ALL versions of Windows 2000 Enable in Control Panel - Regional Settings Turn it on today!
All scripts, any mix Unicode makes representation easy Common framework and APIs Individual script and font handlers Multilingual for no extra effort
Built into standard system APIs Plain text ExtTextOut, Drawtext, TabbedTextOut System edit control Dialog boxes Formatted text Richedit HTML control See the Win32 SDK Don’t write your own formatting
Font fallback Standard system fonts For dialogs, plaintext edit controls and other plaintext display Dialog boxes work automatically
Summary CS support is standard in Windows 2000 No restrictions on script combinations Easy (unless you are implementing your own formatting)
2. Implementation Callouts from GDI and USER Performance Text broken by script and direction Script handlers LPK.DLL
Callouts from GDI and USER ExtTextOut, DrawText passed early to LPK.DLL Plaintext edit control has many callouts Caret placement Text measurement Line breaking Word advance Safe, stable changes to OS core
Fast path for non CS Normal GDI 1:1 char to glyph Simple side by side placement No CS characters If right-to-left, no neutrals If Digit substitution, no digits Performance is good
Split by script and direction Separate e.g. Devanagari, Tamil, Western Left-to-right or right-to-left Unicode bidirectional algorithm Atomic item of display
Handler for each script Script shaping and reordering Devanagari - matra I reordered before consonant cluster Tamil - vowel sign O surrounds consonant cluster Urdu - Initial, media, final, alone forms Various font formats Backward compatability Shaping - ligatures, contextual forms Placement of marks Script handlers understand scripts
Language Pack: LPK.DLL Apply NLS settings (preferred digits) Plaintext edit control Calls to Uniscribe string handling LPK.DLL is OS <> Uniscribe bridge
Application USER GDI LPK.DLL Uni-scribe
Summary Callouts from GDI and USER Performance issues Split by script and direction Script handlers LPK.DLL
3. Details • Clusters • Caret placement and Mouse hits • Word breaking • Font metrics • Measuring text • Metafiles
Clusters • Indivisible - Indian, Thai, Vietnamese • Divisible - Arabic
Caret, mouse hits • For indivisible clusters • Arrow keys skip over clusters • Del deletes entire cluster • Backspace decomposes cluster one character at a time • Arrows and Mouse select whole clusters • Left click snaps to nearest boundary • For divisible clusters • Caret shows proportional position • Use system controls or query Uniscribe
Font metrics • Matching the body height
Font metrics • Matching the ascender
Font metrics • Matching the descender
Matching fonts • When CS text is predominant • Full CS line spacing • Increase Western height • When Western text is predominant • Compromise line spacing • Accept some clipping • System edit control • Line spacing from single font • Richedit, HTML control • Line spacing adjusted for multiple fonts
Measuring text • Adding characters can make text smaller
Metafiles • Device independent • Store Unicode - Enhanced metafile • Use ExtTextOut(W) • Windows adjusts widths for different playback fonts • Device dependant • Avoid • Stores glyphs • Requires identical font for playback
Summary • Caret placement and Mouse hits • Word breaking • Font metrics • Measuring text • Metafiles • Format with richedit, MSHTML
Resources • Uniscribe - next talk • OpenType - later today • Win32 SDK • Richedit • RTF • messages • Text object model • HTML control • HTML • Document object model