1 / 58

Unicode implementation in Adabas and Natural, impact on Application Design

Adabas Implementation. Wide Character Definition. Field Type W implementedThis field type is not necessarily a Unicode field type, but can be used for any character type which requires more then 1 byte for a character representationField types <U> in Natural require the field type <W> in Adabas. D

rowena
Download Presentation

Unicode implementation in Adabas and Natural, impact on Application Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    2. Adabas Implementation

    3. Wide Character Definition Field Type W implemented This field type is not necessarily a Unicode field type, but can be used for any character type which requires more then 1 byte for a character representation Field types <U> in Natural require the field type <W> in Adabas

    4. Data Conversion Default character types in <WE> type fields are UTF 16 (See Unicode definition) Fields with format <A> being stored into <W> type fields will be converted according to the rules of UTF 16 Fields read into <A> type fields from <W> type fields will be converted into the current known code page Characters which cannot be translated will be ignored

    5. Parameter Definition Parameter settings Open Record Buffer (OPRB=(dbid=xx,wcharset=‘utf16-le’,upd=1-13) Source Format (SUTF8=on)

    7. Natural and Unicode In order to be able to handle Unicode, Code Page support had to be implemented in Natural Code Pages are a definition, which internal value corresponds to which visible character Example: C116 corresponds to the character A on a mainframe. All Natural Objects retain the code page identification which was used at development time Natural knows at startup time, which code page the current end user is using All data will be translated into his code page at execution time Only when CPAGE=ON was used at compile time

    8. Natural and Unicode The new data type <U> All fields defined as <U> will be considered to be Unicode fields The new constant type <U> Allows the definition of constants in the Unicode format. Internal Data format is UTF16 and NFC UTF16 is a 2-byte encoding form for Unicode data NFC is the normalization of Unicode data Example: â can be expressed as a followed by ^, or as a single character. The first case means that we will have a 4-byte representation of the same character, whereas the second case only requires 2 bytes. Via NFC all such expanded representations will be combined into a single character (if a pre-composed character exists)

    9. Natural and Unicode Data Conversion and Comparison Done via ICU, the Open Source library to handle Unicode functionality. ICU is a separate module, will only be activated when Unicode handling us required (MF only, in Open System ICU is always present) Two versions available: Small, ICU Conversion only. 2.8MB Large, ICU Conversion plus Collation Functionality. 4.5MB Can be controlled by the ICU Parameter ICU will be used for the code page support as well (See Natural 4.2 Functionality) Optimization: The Unicode conversion table for the current active code page is generated into the Thread (MF only)

    10. Natural and Unicode CPDATA (MF only) (CMSYNIN,codepage),(CMOBJIN,codepage),(CMPRINT,codepage) Specifies in what codepage data have to be expected or be written. All data will be translated (using ICU) form the internal to the external code page and vice versa. The parameter CMSYNIN is available as a separate parameter in Open System as well

    11. Natural and Unicode New Components in Natural NATICU NATICUCV International Components for Unicode NATICUXL NATCPTAB Code Page Character Translation Tables NATSCTU Scanner Character Table for Unicode NATWEB Natural Terminal Driver for Web I/O

    12. Natural and Unicode New Macros NTCFICU Unicode Support NTXML Support of REQUEST DOCUMENT and PARSE statement New Parameters of Macro NTPRM CPCVERR Code Page Conversion Error CPOBJIN Code Page of Batch Input File CPPRINT Code Page of Batch Output File CPSYNIN Code Page of Batch Input File for Commands DBGERR Automatic Start of Debugger at Runtime Error SLOCK Source Locking SRETAIN Retain Source Format THSEP Dynamic Thousands Separator THSEPCH Thousands Separator Character

    13. Natural and Unicode

    14. Natural and Unicode

    15. Natural and Unicode

    16. Natural and Unicode

    17. Natural and Unicode

    18. Natural and Unicode CPOPT=ON ? Use translation tables instead of functions, if possible. CPOPT=OFF ? Use ICU functions in any case.

    19. Natural and Unicode *CODEPAGE (A64), contains the name of the current active code page These Names correspond to the IANA Standard (Internet Assigned Numbers Authority) (http://www.iana.org/) *LOCALE (A8), contains the region and language information Based on ISO 3166 and ISO 639

    20. Natural and Unicode LOCALE = en_US English language in the USA. LOCALE = en_UK English language in the UK. LOCALE = de_DE German language in Germany. LOCALE = de_AT German language in Austria.

    21. Natural and Unicode ISO 3166 Extract (Country Names and Identifiers) DE DEU 276 Germany GH GHA 288 Ghana, Republic of GI GIB 292 Gibraltar GR GRC 300 Greece, Hellenic Republic GL GRL 304 Greenland GD GRD 308 Grenada GP GLP 312 Guadaloupe GU GUM 316 Guam GT GTM 320 Guatemala, Republic of GN GIN 324 Guinea, Revolutionary People's Rep'c of

    22. Natural and Unicode

    23. Natural and Unicode

    24. Natural and Unicode

    25. Natural and Unicode The size of the modules NATICU ca. 5 MB NATICUCV ca. 3,5 MB NATICUXL ca. 12.5 MB NATXML ca. 1 MB Natural Buffers (Threadsize) NATICU ca. 150 KB NATXML ca. 150 KB

    26. Natural and Unicode MOVE NORMALIZED op1 TO op2 Converts Unicode data or op1 (type U) into the normalized form into the target variable op2 (U or A) MOVE ENCODED operand1 [ENCODED [IN] [CODEPAGE] operand2 ] TO operand3 [ENCODED [IN] [CODEPAGE] operand4 ] Converts data from one code page into another DEFINE PRINTER (n) CODEPAGE code-page-name (MF only) Specifies the code page to be used for printing data EXAMINE Contains specific features to support Surrogates Based on Characters, not on bytes

    27. Natural and Unicode There is no direct terminal I/O available on Mainframe (The green terminals don’t support Unicode). All U-Type fields will be converted internally into the current active code page and displayed as A-Type fields Natural Web I/O

    28. Natural and Unicode All modern Windows operating systems are based on Unicode internally. This functionality is used by Natural, so input of Unicode data and the display of such data is no problem at all. How to enter Unicode data? Microsoft offers so called IME (Input Methods) for various languages. These can be downloaded and installed on all newer Office Packages. http://office.microsoft.com/en-us/assistance/HA010347361033.aspx

    29. Natural and Unicode The following Statements support Unicode data ASSIGN MOVE := CALL / CALL FILE / CALL LOOP / CALLNAT COMPRESS CREATE OBJECT / DECIDE / IF DEFINE DATA / CLASS / FUNCTION / PROTOTYPE / WINDOW / WORKFILE DISPLAY / WRITE / PRINT / INPUT / REINPUT (See MF I/O Discussion) EXAMINE EXPAND / RESIZE / REDUCE FETCH / RUN

    30. Natural and Unicode The following Statements support Unicode data (Continued) FIND / READ / STORE / GET / HISTOGRAM / STORE INSERT (SQL) SUBSTRING PARSE PERFORM PASSW PROCESS READ WORK REDEFINE REQUEST DOCUMENT RESET SEPARATE SET CONTROL / KEY SORT (Not first release)

    31. Natural and Unicode The following Statements support Unicode data (Continued) COMPUTATION (ADD / SUBTRACT / MULTIPLY / DIVIDE) STACK TERMINATE

    32. Natural and Unicode Data Transfer is possible between various formats Source Target Rule U U none U B Binary Move, Value might be truncated or padded with blanks U A Converted into current code page A U converted from current code page B1-B4 U B1.B4 => A => U Bn, n>4 U Binary Move, Value might be truncated or padded with blanks D U D => A => U F U F => A => U I U I => A => U L U L => A => U N U N => A => U P U P => A => U T U T => A => U

    33. Natural and Unicode Substring functionality is the same as for <A> type fields, only the unit of work is 2 bytes. This means it will not be possible to destroy a Unicode data string by cutting Unicode data in the middle of the two bytes used for one character This will be true only when no surrogates are used. Surrogates may be cut. Use EXAMINE to Identify correct character position Or use <MOVE NORMALIZED> to convert combining characters into basic characters.

    34. Natural and Unicode U can be compared to A B and U IF op1 = op2 (Formats A and U) Converts op2 to the format of op1 before the operation is done ICU is used for the comparison Format B, binary compare is done, no conversion

    35. Natural and Unicode REDEFINE can be used for U-Type fields (Not for Dynamic Variables) This means that Unicode data can be processed as any other format types Danger, this can destroy the data But allows to see how the data look like internally Can improve performance, especially when using in IF clauses

    36. Natural and Unicode The MASK function can be used for Unicode fields as well Handling is done via ICU Example: 0030 (Unicode value of the digit <0>) will return true with the format <N> (Numeric)

    37. Natural and Unicode The following system Functions support Unicode data COUNT MAX MIN MAXVAL MINVAL NCOUNT OLD POS RET TRIM (Open Systems only) VAL

    38. Natural and Unicode The development Environment SPod is the only environment where the developer can make the full use of all Unicode functionality. (U-Constants) The complete Source of the Natural Object will be converted into Unicode.

    39. Natural and Unicode The library SYSEXV contains example programs for all new functions and features in new versions of Natural. This includes some examples as well which demonstrate the usage of the new Unicode support functionality. * Program ...... V62UCONS * Program ...... V62UDEF * Program ...... V62UENC * Program ...... V62UEXA * Program ...... V62UINP * Program ...... V62UINTR * Program ...... V62UMOVE * Program ...... V62UNORM * Program ...... V62UPARM * Program ...... V62UVAR

    40. Natural Add. Products and Unicode SYSSEC, to be evaluated SYSMAIN, to be evaluated SYSDIC / Predict, to be evaluated NAF, to be evaluated

    41. Natural and Unicode http://www.unicode.org/ http://oss.software.ibm.com/icu/ Helpful web pages http://i.tom.com/ (It’s all Japanese) http://www.greeknewsonline.com/ (It’s all Greek to me) http://www.russianpassportservice.com/ (Need some Russian?) http://www.aljazeera.net/ (Want Arabic data?) http://www.a7.org/ (Or better Hebrew?) http://www.omniglot.com/writing/alphabets.htm (Or some other alphabets?) http://www.decodeunicode.org/ (Unicode decoded)

    43. Contents Getting external Unicode data into the application Manipulating Unicode data Storing/saving Unicode data Retrieving Unicode data from the database Output of Unicode data

    44. Getting External Unicode Data into the Application There are various sources which can be used to enter Unicode data into a Natural application. Of course there is the database, but this will not be considered in this section. The other methods are: Terminal Input Read Work File Request Document CALL External Programs Conversion within the Application

    45. Terminal Input The first is native Unicode data entered into a corresponding Unicode field. This is the easy and straight forward method. This capability however is only available on Windows platforms. The new Windows versions (XP/2000/NT) all have direct Unicode support implemented and Natural makes full use of this functionality. All other platforms (UNIX, Mainframe) do not have the terminal feature available which can support Unicode data input. This is handled by the second approach – entering data in the local code page, and converting the data internally into Unicode. One of the new features of Natural is it provides the information for which code page currently is active for data input, and it is quite easy to convert the existing data into Unicode. This will be done automatically by Natural when the field format used in the corresponding INPUT statement is U (for Unicode). The third possibility is the use of the new feature in Natural 4.2 called: “Web IO”. Basically, this is a new terminal type defined for Natural which transfers the output and input data to a standard Web browser. All standard Web browsers support Unicode, and by the use of this approach Natural can do terminal IO with Unicode data even from the mainframe.

    46. Read Workfile The READ WORK FILE feature of Natural can be used to enter any kind of data into the application. The challenge here is that work files do not have any description of the data which is available in the respective work file. It is the responsibility of the designer to know the format of the work file. This information can then be used to read Unicode data directly into Unicode fields or to read data represented in a predefined code page and convert this data into Unicode.

    47. Request Document The REQUEST DOCUMENT statement can be used to read XML documents directly into the Natural application. XML data are always represented in Unicode, so all data in the document has to be handled as Unicode data.

    48. Call external Program Data can be retrieved from calls to COBOL or other 3rd GL programs as well. Again, it is the responsibility of the designer to know about the interface of such programs and to know in what code page the data are returned. Once the information is known, it is no longer a problem to either convert the data or read the data directly into Unicode fields.

    49. Conversion within the application It will always be possible to convert existing alphabetical data into Unicode within the application. This technology has to be used when the format (Code page) of the existing data is known.

    50. Manipulating Unicode Data The processing of Unicode data is different from the processing of data available in the standard code page. Let’s define what processing actually is: Transfer of Unicode data to other fields, even non-Unicode fields. Comparing Unicode data with other fields, even non-Unicode fields. Extracting data from Unicode fields based on some internal structure. Unicode and the REDEFINE Statement.

    51. Transfer of Unicode Data to Other Fields, Even Non-Unicode Fields Unicode data can easily be transferred into non-Unicode (i.e. A-type) fields. Natural will automatically convert the Unicode data into the current active code page. There are 2 possible outcomes with this depending on the configuration of Natural. First, Natural will issue an error message if the Unicode data cannot be converted completely into the code page of the target field (the ON ERROR logic handles this situation). Second, all characters which are not available in the current code page will be replaced by a special character, indicating that there has been a character, but there is no equivalent character available in the current code page. The result cannot be reversed, so a transfer back to a Unicode field will not be able to restore the missing characters.

    52. Comparing Unicode Data With Other Fields, Even Non-Unicode Fields A challenge with Unicode is that many characters can be represented internally in different forms. So the character <ä> can be represented as a single character, but as a sequence of the base character <a> followed by the two dots above. Externally the data is identical, but the internal representation is different. An IF statement check will return the wrong information if two fields contain the same information. Natural offers the facility to normalize Unicode data. Normalizing Unicode data will convert all combined characters into the single base character, when available. Now a comparison will always return the correct answer.

    53. Extracting Data from Unicode Fields Based on Some Internal Structure Many input fields have a predefined structure. A typical example is the combination of <Last-name, First-name>. It is very convenient to type names in this form (Or even better, allow both possibilities: <First-name “Space” Last-name). The application now has the task to separate the two units, recognizing if the format 1 or 2 has been used, and saving the data in corresponding fields. This has been programmed for single byte languages many times already, and will be required for <Unicode fields as well. The statements SCAN and EXAMINE have been adapted to work on two byte bases for Unicode fields. So an EXAMINE field for U’,’ will analyze the Unicode field for a Unicode character “,”. The resulting position information is based on two byte characters as well. A subsequent MOVE SUBSTRING instruction can use this position and the length of the Unicode characters directly to transfer the data correctly into the target Unicode field.

    54. Unicode and the REDEFINE Statement There has been great discussion amongst the Natural development team if the REDEFINE statement should be allowed for Unicode fields. A developer can make a big mess by redefining Unicode data incorrectly, thus destroying the Unicode characters. The result of the Natural development team’s discussion is the conclusion that all the developers programming in Natural are highly intelligent and responsible persons and they will know exactly what they are doing with their data. So the REDEFINE statement is available for Unicode fields. Any Unicode field can be redefined as a hexadecimal field or as a single byte a-type field. This allows the developer to process the Unicode data in all possible methods they can think of. The first approach simply would be that the developer can see how his/her data is encoded internally.

    55. Storing/Saving Unicode Data Unicode data are stored either in external work files or in databases. The work file routines accept all Unicode data without any restriction, so this is a straight forward approach to saving Unicode data for further processing either by Natural or by any other application. To store Unicode in a database, the data definition module has to be defined with “W” type fields in Adabas, or as Unicode field in DB2. No further preparation needs to be done. Natural automatically transfers the Unicode data correctly to the database in UTF16 format. In Adabas the data will be converted to the UTF8 format for compression reasons. So in a typical Western Latin-oriented application the amount of data is only slightly larger than the original data stored in a local code page.

    56. Retrieving Unicode Data from the Database All data read into Unicode fields will be converted automatically into the UTF16 format. No further processing is done; in particular, no normalization is done. Natural assumes that the data have been normalized before the data was stored in the database. It is possible to use Unicode data in search criteria, i.e. Unicode fields can be descriptor fields. This allows the storage and retrieval of name values written in any kind of character set, including Chinese, Russian, Thai and all other languages. One of the features of ADABAS is the ability to convert internal data into external representation controlled by the specification in the format buffer. This feature makes it possible to access Unicode data in Adabas from non-Unicode applications as well. These applications will get the data automatically converted into the code page of the active application. This allows coexistence of legacy applications with new Unicode enabled applications.

    57. Output of Unicode Data Once the Unicode data has been processed the end user typically wants to see the data either on their screen or on a piece of paper. A prerequisite for this functionality is the ability of the output device to handle Unicode data. When the data is sent via the new IO method “Web IO”, the data will appear correctly on the end user’s screen. Printing of Unicode data requires a printer which can handle Unicode output. As long as the printing is controlled by fonts (i.e. a Unicode font), the data will be printed correctly. This applies to all modern terminal printers. For mass output printers, the functionality needs to be checked. Older versions of mainframe printers may not support Unicode data. In a Windows environment the Printer type has to be <GUI>, not <TTY>

    58. Unicode Constants There is the U’constant’ feature available to define Unicode constants. When you want to specify a constant value containing characters not in the current code page, the Natural Compiler will not accept that value when the U format has not been specified in the statement. MOVE ‘My name is ????????’ will not be accepted MOVE U ‘My name is ????????’ will be accepted UH constant types are Unicode hexadecimal constant This function can be used to move hexadecimal values into a Unicode field

More Related