1 / 50

ARCH-12 Broaden Your Potential Customer Base Using Unicode

ARCH-12 Broaden Your Potential Customer Base Using Unicode. David Lund Sr. Training Program Manager, Progress. Broaden Your Potential Customer Base Using Unicode. Unicode is the best way to support multiple languages A number of recent OpenEdge ™ enhancements facilitate Unicode

monte
Download Presentation

ARCH-12 Broaden Your Potential Customer Base Using Unicode

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ARCH-12Broaden Your Potential Customer Base Using Unicode David Lund Sr. Training Program Manager, Progress

  2. Broaden Your Potential Customer Base Using Unicode • Unicode is the best way to support multiple languages • A number of recent OpenEdge™ enhancements facilitate Unicode • OpenEdge tools simplify the task ARCH-12, Unicode

  3. Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode

  4. Unicode Essentials • Unicode foundation for creating internationalized and localized applications • Unicode provides a unique number for every character • Lossless round tripping • Mapping from any Unicode coded character sequence S to a sequence of bytes and back will produce S again ARCH-12, Unicode

  5. Unicode Essentials • UTF – Unicode Transformation Format • Algorithm for mapping (encoding) Unicode scalar value to a unique sequence • 3 formats (mappings) • UTF-8, UTF-16, UTF-32 • Formats vary in how they handle mapping • Impacts access, storage, and performance ARCH-12, Unicode

  6. Unicode Essentials 20 30 40 0 0 @ 1 ! 1 A 2 2 B 3 # 3 C • Code Page = table that assigns a numeric value • Letters, numbers, punctuation, control codes, etc. • prolang\list-cp.p lists code pages in convmap • Sample Code Page – IBM850 (partial) • Character ‘2’ is hex 32 ARCH-12, Unicode

  7. Progress I18N Essentials I18N (Internationalization) • Undefined code page • Tells Progress not to do any conversions when reading or writing data • For example • Sports database uses undefined • Can be used with any character set ARCH-12, Unicode

  8. Progress I18N Essentials I18N (Internationalization) • Startup parameters • -cpinternal • Code page used for internal data processing • -cpstream • Code page used for stream files • Parameter file prolang\UTF\UTF-8.pf -cpinternal utf-8 -cpstream utf-8 ARCH-12, Unicode

  9. Progress I18N Essentials • Performing code page conversions • Progress provides a character set management facility • Automatically converts data between the code pages of different data sources and targets • Must be in CONVMAP file • Targets for code page conversion • Memory (-cpinternal) • Streams (-cpstream) • Databases ARCH-12, Unicode

  10. Progress I18N Essentials Referenced code pages must be in CONVMAP • Modifying CONVMAP • Edit convmap.dat • Compile CONVMAP • Make convmap.cp available to session • Progress installation directory • PROCONV environment variable • -convmap startup parameter proutil <dbname> –C CODEPAGE-COMPILER convmap.dat convmap.cp ARCH-12, Unicode

  11. Progress I18N Essentials • Converting characters or strings in memory • Specify code page in functions • ASC • CHR • CODEPAGE-CONVERT • Converting input and output data • Specify code page in statements • INPUT FROM (input source to memory target) • OUTPUT TO (memory source to output target) ARCH-12, Unicode

  12. Fonts for Unicode • Locating fonts on windows • C:\WINDOWS\Fonts • Control Panel, select Font icon • Unicode fonts may need to be purchased • Setting Unicode fonts for Progress • Progress.ini • Use ini2reg.exe to place in registry ARCH-12, Unicode

  13. System Resources ARCH-12, Unicode

  14. Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode

  15. Migrating a Database to Unicode • Two ways to migrate database to Unicode • Dump and Load • Converting the database without doing a dump and load • Start an OpenEdge session • Use startup parameters • -cpinternal UTF-8 • -cpstream UTF-8 ARCH-12, Unicode

  16. Migrating a Database to UnicodeCautions: Using dump and load 1 of 3 • Backup your database • Dump definitions and data • Do not do a binary dump and load • Binary data is not converted to the code page of the database when it is loaded • Always use Data Admin tool • Goes through automatic conversion ARCH-12, Unicode

  17. Migrating a Database to Unicode Using dump and load 2 of 3 • Create an empty UTF-8 database • Data Administration tool • Database>Create Database • Create Database dialog • Select radio set to create a copy of some other database • Select an empty database from prolang/UTF-8 • For example empty4.db ARCH-12, Unicode

  18. Migrating a Database to Unicode Using dump and load 3 of 3 • Load the Definitions • Load will convert to UTF-8 automatically • Load the Data • Data will be automatically converted to UTF-8 from the dumped code page when it is loaded ARCH-12, Unicode

  19. Migrating a Database to Unicode Converting without a dump and load • Backup your database • Use proutil to convert the database • Load the UTF-8 collation table • prolang/UTF/_tran.df • Assign a word break rules to the database • Rebuild the indexes proutil <db-name> -C convchar convert UTF-8 proutil <db-name> -C idxbuild ARCH-12, Unicode

  20. Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode

  21. Benefits of GUI Unicode Client Added in OpenEdge 10.0A release • Multi-lingual • Able to use data from multiple languages in the same session • Fully enables AppBuilder to build multilingual UTF-8 applications • Easier deployment: • Lower costs, higher ROI • No need to have different configurations using specific settings per language • Increased competitive advantage • No (or very few changes) required to existing apps to take advantage of GUI Unicode client ARCH-12, Unicode

  22. Unicode Editor • RichEdit editor in OpenEdge 10 • Supports Unicode • Selecting an editor • Modify UseSourceEditor in progress.ini • Default SlickEdit: • UseSourceEditor=yes • For Unicode use RichEdit: • UseSourceEditor=no ARCH-12, Unicode

  23. Demonstration GUI Unicode Client Multiple Languages ARCH-12, Unicode

  24. Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode

  25. Linguistic Sorting The goal … • Language sensitive collations • Tailor to expectations of locale • Language • Country • Easy to use • Functions just like any other collation for 4GL ARCH-12, Unicode

  26. Unicode Sorting • OpenEdge 10.0A supports binary sorting • Basic collation support • Sorts by value in code page • Possible to do user defined sorting • OpenEdge 10.0B also supports linguistic sorting • Supports ICU collations • International Components for Unicode • OpenEdge does not support multiple collations in the database ARCH-12, Unicode

  27. beet carrot entry trust zoom école çedilla beet carrot çedilla école entry trust zoom Binary versus Linguistic Sorting -A Visual Linguistic Sort Binary Sort English (ICU-en) ARCH-12, Unicode

  28. Linguistic Sorting • Progress uses collations for: • -cpcoll session startup parameter • Database collation • Collation of database CLOB column • Argument to • COMPARE function • COLLATE option of the BY phrase ARCH-12, Unicode

  29. Linguistic SortingSupported Collations • OpenEdge supports all ICU collations in the icui18n library • Beyond icui18n one additional collation is supported • Japanese Hiragana Quaternary as case-sensitive ARCH-12, Unicode

  30. Linguistic Sorting • 4GL Usage - Reference collation by name • For example “ICU-fr” for French • Specify using • -cpcoll<table name> • Identifies collation table to use with code page in memory at session startup • <table name> is the collation table in convmap.cp or the name of the ICU collation • 4GL Statements • COMPARE • COLLATE ARCH-12, Unicode

  31. Linguistic Sorting Sort order depends on selected collation /* French collation */ DISPLAY “ICU-fr = ” + COMPARE("côte", "<", "coté", "case-insensitive", "ICU-fr") /* Spanish collation */ DISPLAY “ICU-es = ” + COMPARE("côte", "<", "coté", "case-insensitive", "ICU-es") • Output of above statements ICU-fr= yes ICU-es= no ARCH-12, Unicode

  32. Linguistic SortingExamples 1 of 4 • Examples • UTF-8 database with “basic” collation • Names: beet, carrot, çedilla, entry, école, zoom, trust FOR EACH words WHERE name < “t”: DISPLAY name. END. • Output result beet carrot entry ARCH-12, Unicode

  33. Linguistic SortingExamples 2 of 4 FOR EACH words WHERE name >= “t”: DISPLAY name. END. • Output result trust zoom école çedilla ARCH-12, Unicode

  34. Linguistic SortingExamples 3 of 4 FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”, “ICU-en”): DISPLAY name. END. • Output result beet carrot entry école çedilla ARCH-12, Unicode

  35. Linguistic SortingExamples 4 of 4 FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”, “ICU-en”) BY COLLATE(name, “case-insensitive”, “ICU-en”): DISPLAY name. END. • Output result beet carrot çedilla école entry ARCH-12, Unicode

  36. Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode

  37. Unicode Normalization • Why is this needed? • Puts in “NCF” format as expected by XML (and other W3C entities) • Best way to convert from Unicode to other code pages • Useful when doing tasks such as making comparisons ARCH-12, Unicode

  38. Unicode Normalization What is normalization? • Unicode has different ways of expressing the same characters • Base letter plus combining marks (accents) as two Unicode code points • Á = composite (composed) (U+0041, Latin Capital Letter A) + (U+0301, Combining Acute Accent ´) • Base letter and accents as one Unicode code point • Á = precomposed (U+00C1, Latin Capital Letter A with Acute) ARCH-12, Unicode

  39. Unicode Normalization • NORMALIZE • 4GL function new in OpenEdge 10.0B • Returns either CHAR or LONGCHAR • Matches the source string • CHAR variable must be UTF-8 • LONGCHAR variable any form of Unicode • UTF-8, UTF-16, UTF-32 result-string = NORMALIZE(source-string, normalization-mode) ARCH-12, Unicode

  40. Normalization Modes Supported • NFD • Canonical Decomposition • NFC • Canonical Decomposition, followed by Canonical Composition • NFKD • Compatibility Decomposition • NFKC • Compatibility Decomposition, followed by Canonical Composition • None • No change to source string • Turns off normalization when normalization-mode is a variable ARCH-12, Unicode

  41. Agenda - Implementing Unicode • Essentials • Migrating a Database • Unicode Client • Sorting • Normalization • Other Areas to Consider ARCH-12, Unicode

  42. Bidi Support • Bi-directional (bidi) • Behavior of individual widgets and/or the complete window to go from right to left or left to right • Supported • Fill-in widget • Can type right to left of left to right • Not-Supported • Whole frame • Cannot switch labels from left side to right side ARCH-12, Unicode

  43. GB18030 Code Page SupportAdded in OpenEdge 10.0B • New Chinese code page • Required for all new software sold in mainland China as of Jan. 1, 2001 ARCH-12, Unicode

  44. Broaden Your Potential Customer Base Using Unicode In summary • Unicode is the best way to support multiple languages • A number of recent OpenEdge™ enhancements facilitate Unicode • OpenEdge tools simplify the task ARCH-12, Unicode

  45. Documentation • OpenEdge Development • Internationalizing Applications ARCH-12, Unicode

  46. Unicode Resources Unicode Home page • http://www.unicode.org • Unicode Standard, Unicode Consortium • International Components for Unicode • http://www-124.ibm.com/icu/docs/ • http://www-124.ibm.com/icu/docs/papers/forms_of_unicode/ ARCH-12, Unicode

  47. System Resources • Viewing keyboard layouts http://www.microsoft.com/globaldev/reference/keyboards.aspx • Select the language and the keyboard layout is displayed • Use shift key to toggle to ‘lower/upper’ case characters • Use MS Internet Explorer to display ARCH-12, Unicode

  48. Questions? ARCH-12, Unicode

  49. Thank you for your time! ARCH-12, Unicode

  50. ARCH-12, Unicode

More Related