ARCH-12 Broaden Your Potential Customer Base Using Unicode

ARCH-12Broaden Your Potential Customer Base Using Unicode David Lund Sr. Training Program Manager, Progress

Broaden Your Potential Customer Base Using Unicode • Unicode is the best way to support multiple languages • A number of recent OpenEdge™ enhancements facilitate Unicode • OpenEdge tools simplify the task ARCH-12, Unicode

Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode

Unicode Essentials • Unicode foundation for creating internationalized and localized applications • Unicode provides a unique number for every character • Lossless round tripping • Mapping from any Unicode coded character sequence S to a sequence of bytes and back will produce S again ARCH-12, Unicode

Unicode Essentials • UTF – Unicode Transformation Format • Algorithm for mapping (encoding) Unicode scalar value to a unique sequence • 3 formats (mappings) • UTF-8, UTF-16, UTF-32 • Formats vary in how they handle mapping • Impacts access, storage, and performance ARCH-12, Unicode

Unicode Essentials 20 30 40 0 0 @ 1 ! 1 A 2 2 B 3 # 3 C • Code Page = table that assigns a numeric value • Letters, numbers, punctuation, control codes, etc. • prolang\list-cp.p lists code pages in convmap • Sample Code Page – IBM850 (partial) • Character ‘2’ is hex 32 ARCH-12, Unicode

Progress I18N Essentials I18N (Internationalization) • Undefined code page • Tells Progress not to do any conversions when reading or writing data • For example • Sports database uses undefined • Can be used with any character set ARCH-12, Unicode

Progress I18N Essentials I18N (Internationalization) • Startup parameters • -cpinternal • Code page used for internal data processing • -cpstream • Code page used for stream files • Parameter file prolang\UTF\UTF-8.pf -cpinternal utf-8 -cpstream utf-8 ARCH-12, Unicode

Progress I18N Essentials • Performing code page conversions • Progress provides a character set management facility • Automatically converts data between the code pages of different data sources and targets • Must be in CONVMAP file • Targets for code page conversion • Memory (-cpinternal) • Streams (-cpstream) • Databases ARCH-12, Unicode

Progress I18N Essentials Referenced code pages must be in CONVMAP • Modifying CONVMAP • Edit convmap.dat • Compile CONVMAP • Make convmap.cp available to session • Progress installation directory • PROCONV environment variable • -convmap startup parameter proutil <dbname> –C CODEPAGE-COMPILER convmap.dat convmap.cp ARCH-12, Unicode

Progress I18N Essentials • Converting characters or strings in memory • Specify code page in functions • ASC • CHR • CODEPAGE-CONVERT • Converting input and output data • Specify code page in statements • INPUT FROM (input source to memory target) • OUTPUT TO (memory source to output target) ARCH-12, Unicode

Fonts for Unicode • Locating fonts on windows • C:\WINDOWS\Fonts • Control Panel, select Font icon • Unicode fonts may need to be purchased • Setting Unicode fonts for Progress • Progress.ini • Use ini2reg.exe to place in registry ARCH-12, Unicode

System Resources ARCH-12, Unicode

Migrating a Database to Unicode • Two ways to migrate database to Unicode • Dump and Load • Converting the database without doing a dump and load • Start an OpenEdge session • Use startup parameters • -cpinternal UTF-8 • -cpstream UTF-8 ARCH-12, Unicode

Migrating a Database to UnicodeCautions: Using dump and load 1 of 3 • Backup your database • Dump definitions and data • Do not do a binary dump and load • Binary data is not converted to the code page of the database when it is loaded • Always use Data Admin tool • Goes through automatic conversion ARCH-12, Unicode

Migrating a Database to Unicode Using dump and load 2 of 3 • Create an empty UTF-8 database • Data Administration tool • Database>Create Database • Create Database dialog • Select radio set to create a copy of some other database • Select an empty database from prolang/UTF-8 • For example empty4.db ARCH-12, Unicode

Migrating a Database to Unicode Using dump and load 3 of 3 • Load the Definitions • Load will convert to UTF-8 automatically • Load the Data • Data will be automatically converted to UTF-8 from the dumped code page when it is loaded ARCH-12, Unicode

Migrating a Database to Unicode Converting without a dump and load • Backup your database • Use proutil to convert the database • Load the UTF-8 collation table • prolang/UTF/_tran.df • Assign a word break rules to the database • Rebuild the indexes proutil <db-name> -C convchar convert UTF-8 proutil <db-name> -C idxbuild ARCH-12, Unicode

Benefits of GUI Unicode Client Added in OpenEdge 10.0A release • Multi-lingual • Able to use data from multiple languages in the same session • Fully enables AppBuilder to build multilingual UTF-8 applications • Easier deployment: • Lower costs, higher ROI • No need to have different configurations using specific settings per language • Increased competitive advantage • No (or very few changes) required to existing apps to take advantage of GUI Unicode client ARCH-12, Unicode

Unicode Editor • RichEdit editor in OpenEdge 10 • Supports Unicode • Selecting an editor • Modify UseSourceEditor in progress.ini • Default SlickEdit: • UseSourceEditor=yes • For Unicode use RichEdit: • UseSourceEditor=no ARCH-12, Unicode

Demonstration GUI Unicode Client Multiple Languages ARCH-12, Unicode

Linguistic Sorting The goal … • Language sensitive collations • Tailor to expectations of locale • Language • Country • Easy to use • Functions just like any other collation for 4GL ARCH-12, Unicode

Unicode Sorting • OpenEdge 10.0A supports binary sorting • Basic collation support • Sorts by value in code page • Possible to do user defined sorting • OpenEdge 10.0B also supports linguistic sorting • Supports ICU collations • International Components for Unicode • OpenEdge does not support multiple collations in the database ARCH-12, Unicode

beet carrot entry trust zoom école çedilla beet carrot çedilla école entry trust zoom Binary versus Linguistic Sorting -A Visual Linguistic Sort Binary Sort English (ICU-en) ARCH-12, Unicode

Linguistic Sorting • Progress uses collations for: • -cpcoll session startup parameter • Database collation • Collation of database CLOB column • Argument to • COMPARE function • COLLATE option of the BY phrase ARCH-12, Unicode

Linguistic SortingSupported Collations • OpenEdge supports all ICU collations in the icui18n library • Beyond icui18n one additional collation is supported • Japanese Hiragana Quaternary as case-sensitive ARCH-12, Unicode

Linguistic Sorting • 4GL Usage - Reference collation by name • For example “ICU-fr” for French • Specify using • -cpcoll<table name> • Identifies collation table to use with code page in memory at session startup • <table name> is the collation table in convmap.cp or the name of the ICU collation • 4GL Statements • COMPARE • COLLATE ARCH-12, Unicode

Linguistic Sorting Sort order depends on selected collation /* French collation */ DISPLAY “ICU-fr = ” + COMPARE("côte", "<", "coté", "case-insensitive", "ICU-fr") /* Spanish collation */ DISPLAY “ICU-es = ” + COMPARE("côte", "<", "coté", "case-insensitive", "ICU-es") • Output of above statements ICU-fr= yes ICU-es= no ARCH-12, Unicode

Linguistic SortingExamples 1 of 4 • Examples • UTF-8 database with “basic” collation • Names: beet, carrot, çedilla, entry, école, zoom, trust FOR EACH words WHERE name < “t”: DISPLAY name. END. • Output result beet carrot entry ARCH-12, Unicode

Linguistic SortingExamples 2 of 4 FOR EACH words WHERE name >= “t”: DISPLAY name. END. • Output result trust zoom école çedilla ARCH-12, Unicode

Linguistic SortingExamples 3 of 4 FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”, “ICU-en”): DISPLAY name. END. • Output result beet carrot entry école çedilla ARCH-12, Unicode

Linguistic SortingExamples 4 of 4 FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”, “ICU-en”) BY COLLATE(name, “case-insensitive”, “ICU-en”): DISPLAY name. END. • Output result beet carrot çedilla école entry ARCH-12, Unicode

Unicode Normalization • Why is this needed? • Puts in “NCF” format as expected by XML (and other W3C entities) • Best way to convert from Unicode to other code pages • Useful when doing tasks such as making comparisons ARCH-12, Unicode

Unicode Normalization What is normalization? • Unicode has different ways of expressing the same characters • Base letter plus combining marks (accents) as two Unicode code points • Á = composite (composed) (U+0041, Latin Capital Letter A) + (U+0301, Combining Acute Accent ´) • Base letter and accents as one Unicode code point • Á = precomposed (U+00C1, Latin Capital Letter A with Acute) ARCH-12, Unicode

Unicode Normalization • NORMALIZE • 4GL function new in OpenEdge 10.0B • Returns either CHAR or LONGCHAR • Matches the source string • CHAR variable must be UTF-8 • LONGCHAR variable any form of Unicode • UTF-8, UTF-16, UTF-32 result-string = NORMALIZE(source-string, normalization-mode) ARCH-12, Unicode

Normalization Modes Supported • NFD • Canonical Decomposition • NFC • Canonical Decomposition, followed by Canonical Composition • NFKD • Compatibility Decomposition • NFKC • Compatibility Decomposition, followed by Canonical Composition • None • No change to source string • Turns off normalization when normalization-mode is a variable ARCH-12, Unicode

Bidi Support • Bi-directional (bidi) • Behavior of individual widgets and/or the complete window to go from right to left or left to right • Supported • Fill-in widget • Can type right to left of left to right • Not-Supported • Whole frame • Cannot switch labels from left side to right side ARCH-12, Unicode

GB18030 Code Page SupportAdded in OpenEdge 10.0B • New Chinese code page • Required for all new software sold in mainland China as of Jan. 1, 2001 ARCH-12, Unicode

Broaden Your Potential Customer Base Using Unicode In summary • Unicode is the best way to support multiple languages • A number of recent OpenEdge™ enhancements facilitate Unicode • OpenEdge tools simplify the task ARCH-12, Unicode

Documentation • OpenEdge Development • Internationalizing Applications ARCH-12, Unicode

Unicode Resources Unicode Home page • http://www.unicode.org • Unicode Standard, Unicode Consortium • International Components for Unicode • http://www-124.ibm.com/icu/docs/ • http://www-124.ibm.com/icu/docs/papers/forms_of_unicode/ ARCH-12, Unicode

System Resources • Viewing keyboard layouts http://www.microsoft.com/globaldev/reference/keyboards.aspx • Select the language and the keyboard layout is displayed • Use shift key to toggle to ‘lower/upper’ case characters • Use MS Internet Explorer to display ARCH-12, Unicode

Questions? ARCH-12, Unicode

Thank you for your time! ARCH-12, Unicode

ARCH-12, Unicode

ARCH-12 Broaden Your Potential Customer Base Using Unicode

ARCH-12 Broaden Your Potential Customer Base Using Unicode

Presentation Transcript

Unicode

Customer Base 200,000

ARCH-12 Broaden Your Potential Customer Base Using Unicode

Improve Your Customer Base with Esources: UK

Reverse Engineering to Grow Your Customer Base

Hi Potential Customer!

Customer Base Expectations

Working Your existing customer base

Broaden Your Horizon

Working Your existing customer base

‘ £££ LEVERAGING YOUR CUSTOMER BASE ’

Unicode

Using Unicode for Linguistic Data

UNICODE

Using XML Parsers and Unicode

Survey Customer Base

Improve your Customer Base with Esources.co.uk

Train and Grow your Customer Base

Attract Your Potential Customer Through Marketing Promotion

Broaden participation

Broaden participation