500 likes | 683 Views
ARCH-12 Broaden Your Potential Customer Base Using Unicode. David Lund Sr. Training Program Manager, Progress. Broaden Your Potential Customer Base Using Unicode. Unicode is the best way to support multiple languages A number of recent OpenEdge ™ enhancements facilitate Unicode
E N D
ARCH-12Broaden Your Potential Customer Base Using Unicode David Lund Sr. Training Program Manager, Progress
Broaden Your Potential Customer Base Using Unicode • Unicode is the best way to support multiple languages • A number of recent OpenEdge™ enhancements facilitate Unicode • OpenEdge tools simplify the task ARCH-12, Unicode
Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode
Unicode Essentials • Unicode foundation for creating internationalized and localized applications • Unicode provides a unique number for every character • Lossless round tripping • Mapping from any Unicode coded character sequence S to a sequence of bytes and back will produce S again ARCH-12, Unicode
Unicode Essentials • UTF – Unicode Transformation Format • Algorithm for mapping (encoding) Unicode scalar value to a unique sequence • 3 formats (mappings) • UTF-8, UTF-16, UTF-32 • Formats vary in how they handle mapping • Impacts access, storage, and performance ARCH-12, Unicode
Unicode Essentials 20 30 40 0 0 @ 1 ! 1 A 2 2 B 3 # 3 C • Code Page = table that assigns a numeric value • Letters, numbers, punctuation, control codes, etc. • prolang\list-cp.p lists code pages in convmap • Sample Code Page – IBM850 (partial) • Character ‘2’ is hex 32 ARCH-12, Unicode
Progress I18N Essentials I18N (Internationalization) • Undefined code page • Tells Progress not to do any conversions when reading or writing data • For example • Sports database uses undefined • Can be used with any character set ARCH-12, Unicode
Progress I18N Essentials I18N (Internationalization) • Startup parameters • -cpinternal • Code page used for internal data processing • -cpstream • Code page used for stream files • Parameter file prolang\UTF\UTF-8.pf -cpinternal utf-8 -cpstream utf-8 ARCH-12, Unicode
Progress I18N Essentials • Performing code page conversions • Progress provides a character set management facility • Automatically converts data between the code pages of different data sources and targets • Must be in CONVMAP file • Targets for code page conversion • Memory (-cpinternal) • Streams (-cpstream) • Databases ARCH-12, Unicode
Progress I18N Essentials Referenced code pages must be in CONVMAP • Modifying CONVMAP • Edit convmap.dat • Compile CONVMAP • Make convmap.cp available to session • Progress installation directory • PROCONV environment variable • -convmap startup parameter proutil <dbname> –C CODEPAGE-COMPILER convmap.dat convmap.cp ARCH-12, Unicode
Progress I18N Essentials • Converting characters or strings in memory • Specify code page in functions • ASC • CHR • CODEPAGE-CONVERT • Converting input and output data • Specify code page in statements • INPUT FROM (input source to memory target) • OUTPUT TO (memory source to output target) ARCH-12, Unicode
Fonts for Unicode • Locating fonts on windows • C:\WINDOWS\Fonts • Control Panel, select Font icon • Unicode fonts may need to be purchased • Setting Unicode fonts for Progress • Progress.ini • Use ini2reg.exe to place in registry ARCH-12, Unicode
System Resources ARCH-12, Unicode
Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode
Migrating a Database to Unicode • Two ways to migrate database to Unicode • Dump and Load • Converting the database without doing a dump and load • Start an OpenEdge session • Use startup parameters • -cpinternal UTF-8 • -cpstream UTF-8 ARCH-12, Unicode
Migrating a Database to UnicodeCautions: Using dump and load 1 of 3 • Backup your database • Dump definitions and data • Do not do a binary dump and load • Binary data is not converted to the code page of the database when it is loaded • Always use Data Admin tool • Goes through automatic conversion ARCH-12, Unicode
Migrating a Database to Unicode Using dump and load 2 of 3 • Create an empty UTF-8 database • Data Administration tool • Database>Create Database • Create Database dialog • Select radio set to create a copy of some other database • Select an empty database from prolang/UTF-8 • For example empty4.db ARCH-12, Unicode
Migrating a Database to Unicode Using dump and load 3 of 3 • Load the Definitions • Load will convert to UTF-8 automatically • Load the Data • Data will be automatically converted to UTF-8 from the dumped code page when it is loaded ARCH-12, Unicode
Migrating a Database to Unicode Converting without a dump and load • Backup your database • Use proutil to convert the database • Load the UTF-8 collation table • prolang/UTF/_tran.df • Assign a word break rules to the database • Rebuild the indexes proutil <db-name> -C convchar convert UTF-8 proutil <db-name> -C idxbuild ARCH-12, Unicode
Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode
Benefits of GUI Unicode Client Added in OpenEdge 10.0A release • Multi-lingual • Able to use data from multiple languages in the same session • Fully enables AppBuilder to build multilingual UTF-8 applications • Easier deployment: • Lower costs, higher ROI • No need to have different configurations using specific settings per language • Increased competitive advantage • No (or very few changes) required to existing apps to take advantage of GUI Unicode client ARCH-12, Unicode
Unicode Editor • RichEdit editor in OpenEdge 10 • Supports Unicode • Selecting an editor • Modify UseSourceEditor in progress.ini • Default SlickEdit: • UseSourceEditor=yes • For Unicode use RichEdit: • UseSourceEditor=no ARCH-12, Unicode
Demonstration GUI Unicode Client Multiple Languages ARCH-12, Unicode
Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode
Linguistic Sorting The goal … • Language sensitive collations • Tailor to expectations of locale • Language • Country • Easy to use • Functions just like any other collation for 4GL ARCH-12, Unicode
Unicode Sorting • OpenEdge 10.0A supports binary sorting • Basic collation support • Sorts by value in code page • Possible to do user defined sorting • OpenEdge 10.0B also supports linguistic sorting • Supports ICU collations • International Components for Unicode • OpenEdge does not support multiple collations in the database ARCH-12, Unicode
beet carrot entry trust zoom école çedilla beet carrot çedilla école entry trust zoom Binary versus Linguistic Sorting -A Visual Linguistic Sort Binary Sort English (ICU-en) ARCH-12, Unicode
Linguistic Sorting • Progress uses collations for: • -cpcoll session startup parameter • Database collation • Collation of database CLOB column • Argument to • COMPARE function • COLLATE option of the BY phrase ARCH-12, Unicode
Linguistic SortingSupported Collations • OpenEdge supports all ICU collations in the icui18n library • Beyond icui18n one additional collation is supported • Japanese Hiragana Quaternary as case-sensitive ARCH-12, Unicode
Linguistic Sorting • 4GL Usage - Reference collation by name • For example “ICU-fr” for French • Specify using • -cpcoll<table name> • Identifies collation table to use with code page in memory at session startup • <table name> is the collation table in convmap.cp or the name of the ICU collation • 4GL Statements • COMPARE • COLLATE ARCH-12, Unicode
Linguistic Sorting Sort order depends on selected collation /* French collation */ DISPLAY “ICU-fr = ” + COMPARE("côte", "<", "coté", "case-insensitive", "ICU-fr") /* Spanish collation */ DISPLAY “ICU-es = ” + COMPARE("côte", "<", "coté", "case-insensitive", "ICU-es") • Output of above statements ICU-fr= yes ICU-es= no ARCH-12, Unicode
Linguistic SortingExamples 1 of 4 • Examples • UTF-8 database with “basic” collation • Names: beet, carrot, çedilla, entry, école, zoom, trust FOR EACH words WHERE name < “t”: DISPLAY name. END. • Output result beet carrot entry ARCH-12, Unicode
Linguistic SortingExamples 2 of 4 FOR EACH words WHERE name >= “t”: DISPLAY name. END. • Output result trust zoom école çedilla ARCH-12, Unicode
Linguistic SortingExamples 3 of 4 FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”, “ICU-en”): DISPLAY name. END. • Output result beet carrot entry école çedilla ARCH-12, Unicode
Linguistic SortingExamples 4 of 4 FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”, “ICU-en”) BY COLLATE(name, “case-insensitive”, “ICU-en”): DISPLAY name. END. • Output result beet carrot çedilla école entry ARCH-12, Unicode
Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode
Unicode Normalization • Why is this needed? • Puts in “NCF” format as expected by XML (and other W3C entities) • Best way to convert from Unicode to other code pages • Useful when doing tasks such as making comparisons ARCH-12, Unicode
Unicode Normalization What is normalization? • Unicode has different ways of expressing the same characters • Base letter plus combining marks (accents) as two Unicode code points • Á = composite (composed) (U+0041, Latin Capital Letter A) + (U+0301, Combining Acute Accent ´) • Base letter and accents as one Unicode code point • Á = precomposed (U+00C1, Latin Capital Letter A with Acute) ARCH-12, Unicode
Unicode Normalization • NORMALIZE • 4GL function new in OpenEdge 10.0B • Returns either CHAR or LONGCHAR • Matches the source string • CHAR variable must be UTF-8 • LONGCHAR variable any form of Unicode • UTF-8, UTF-16, UTF-32 result-string = NORMALIZE(source-string, normalization-mode) ARCH-12, Unicode
Normalization Modes Supported • NFD • Canonical Decomposition • NFC • Canonical Decomposition, followed by Canonical Composition • NFKD • Compatibility Decomposition • NFKC • Compatibility Decomposition, followed by Canonical Composition • None • No change to source string • Turns off normalization when normalization-mode is a variable ARCH-12, Unicode
Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode
Bidi Support • Bi-directional (bidi) • Behavior of individual widgets and/or the complete window to go from right to left or left to right • Supported • Fill-in widget • Can type right to left of left to right • Not-Supported • Whole frame • Cannot switch labels from left side to right side ARCH-12, Unicode
GB18030 Code Page SupportAdded in OpenEdge 10.0B • New Chinese code page • Required for all new software sold in mainland China as of Jan. 1, 2001 ARCH-12, Unicode
Broaden Your Potential Customer Base Using Unicode In summary • Unicode is the best way to support multiple languages • A number of recent OpenEdge™ enhancements facilitate Unicode • OpenEdge tools simplify the task ARCH-12, Unicode
Documentation • OpenEdge Development • Internationalizing Applications ARCH-12, Unicode
Unicode Resources Unicode Home page • http://www.unicode.org • Unicode Standard, Unicode Consortium • International Components for Unicode • http://www-124.ibm.com/icu/docs/ • http://www-124.ibm.com/icu/docs/papers/forms_of_unicode/ ARCH-12, Unicode
System Resources • Viewing keyboard layouts http://www.microsoft.com/globaldev/reference/keyboards.aspx • Select the language and the keyboard layout is displayed • Use shift key to toggle to ‘lower/upper’ case characters • Use MS Internet Explorer to display ARCH-12, Unicode
Questions? ARCH-12, Unicode
Thank you for your time! ARCH-12, Unicode