1.11k likes | 1.4k Views
XML and COBOL. XML. XML = Extensible Markup Language Used to expose the structure and content of a document Becoming a universal means of exchanging data Tag language <author> <firstname>Charles</firstname> <lastname>Dickens</lastname> </author>.
E N D
XML • XML = Extensible Markup Language • Used to expose the structure and content of a document • Becoming a universal means of exchanging data • Tag language <author> <firstname>Charles</firstname> <lastname>Dickens</lastname> </author>
XML • Tags are user-defined • Every start tag has a matching stop tag <atag> …</atag> • Sometimes the tags are combined into one start and stop tag <media type = “CD” /> • Tags can’t overlap NO: <a> <b> </a> </b>
XML • Tags can be nested <a> <b> </b> </a> • Documents are tree-structured <a> <b></b> <c> <d></d> </c> </a> a b c d
XML • Text based documents • Case sensitive • Must contain a single root element • Start with an XML declaration and comments <?xml version =“1.0”?> <!– comment line - -> <a> </a>
XML • XML is “Well Formed” if 1) Single root element 2) Start and end tags matching for all elements 3) Proper nesting 4) Attribute values in quotes
XML Parsers • An XML parser is a program that can read an XML document and provide programmatic access to the document • Two types of parsers: 1) DOM based – Document Object Model Constructs a tree that represents the document 2) SAX based – Simple API for XML Generates events when parts of the document are encountered. • Can also be classified as “push” or “pull” parsers
XML Characters • Consist of carriage returns, line feeds and Unicode characters • XML is either “markup” or “text” • Markup is enclosed in < and > • Character text is the text between a start and end tag • Child elements are considered markup
White Space • Parsers consider whitespace inside text data to be significant and must pass it to an application • An application can consider whitespace significant or insignificant. • Normalization is the process in which whitespace is collapsed or removed
Entities • &, <, >, ‘ (apostrophe), and “(double quote) are special characters and may not be used in character data directly • To use these characters we code entity references which begin with an ampersand and end with a semicolon • & < > ' " • <mytag>David's Tag</mytag>
Unicode • XML supports Unicode • Each Unicode character starts with an ampersand, followed by a sharp (#), an integer, and a semicolon • د
Determining the Encoding Type • Sources used to determine the encoding of an XML document when XMLPARSE(XMLSS) is in effect: • The type of data item that contains the XML document. (We will only consider alphanumeric.) • The ENCODING phrase (if used) on the PARSE statement • The CCSID specified in the CODEPAGE compiler option
Two ways to Specify the Encoding for XMLPARSE(XMLSS) • Put the document in an alphanumeric item (PIC X) 1) Specify the encoding on the PARSE statement: PARSE MYDOC WITH ENCODING 1208 … 2) Add a CODEPAGE compiler option
Markup • Most items have distinct begin and end tags: <name>David</name> • Empty elements begin and end with one tag: <img src = “img.gif” /> • Tags can contain “attributes” as in the src attribute in the img tag above • Attribute values must be quoted with single or double quotes • Element and attribute names can be any length and may contain letters, digits, underscores, hyphens and periods. Must begin with letter or underscore.
Comments • Comments in XML have the same format as HTML • Start with <!— • End with --> • <!– This is a comment -->
Processing Instructions • Example: <?xml:stylesheet type=“text/xsl” href=“usage.xsl”?> • Delimited by <? and ?> • Passed to the parser for additional information about the document • Contain a “PI Target”: xml:stylesheet • Contain a “PI Value”: type=“text/xsl” href=“usage.xsl” • Allow a document author to embed application specific info in the document
CDATA Sections • CDATA section can contain characters, reserved characters, and white space • Not processed by the XML parser • Sometimes used for scripting code or embedding XML inside a document • Begin with <![CDATA[ • End with ]]> • < and > must be coded as entities
CDATA Example <![CDATA if (x > y) { x = x + y; } ]]>
XML Namespaces • Different authors might create the same tag names with different meanings • Namespaces provide a means for authors to prevent collisions on tag names • <book:title>The Idiot</book:title> • <movie:title>Avatar</movie:title>
XML Namespaces • Each namespace is tied to a Uniform Resource Identifier (URI) • Authors create their own namespace prefixes • Namespaces are created in the root tag: <library xmlns:book = “urn:woolbright:bookinfo” xmlns:movie = “urn:woolbright:movieinfo”> • URLs are sometimes used for URIs
Default Namespaces • To avoid coding a prefix on every tag, code a default namespace: <library xmlns = “urn.woolbright:default” xmlns:book = “urn:woolbright:bookinfo” xmlns:movie = “urn:woolbright:movieinfo”>
Enterprise COBOL • Contains two event-based parsers that allows you to read XML documents and process them with COBOL • XML documents can be retrieved from an MQ message, CICS TD queue, or IMS message processing queue • XML documents that are read from a file must be brought into storage as a single item. (Records can be combined using STRING or other techniques) • If XMLPARSE(XMLSS) is in effect, you can parse an XML file by passing one record at a time
z/OS COBOL Features for Processing XML Input XML PARSE – begins parsing the document and identifies the processing procedure in your document XML PARSE MYDOCUMENT PROCESSING PROCEDURE 100-PARSE ON EXCEPTION DISPLAY ‘XML DOCUMENT ERROR’ XML-CODE STOP RUN NOT ON EXCEPTION DISPLAY ‘PARSED DOCUMENT SUCCESSFULLY’ END-XML
z/OS COBOL Features for Processing XML Input • Processing XML involves passing control between the parser and the processing procedure you write to handle events • Processing Procedure – receives and processes the events that are generated by the parser. This is a paragraph or section in your Cobol program
Parser/Procedure Interaction • Parser passes control to the procedure for each XML event • Control returns to the parser at the end of the procedure • This continues until either: 1) the parser detects an error in the document and signals an EXCEPTION event, or 2) the parser signals END-OF-INPUT event and the processing procedure returns to the parser with XML-CODE still set at 0 3) you terminate parsing deliberately by setting XML-CODE to -1 before returning to the parser
XML Parsers • Compiler options control the parser type • CBL XMLPARSE(XMLSS) – chooses the z/OS XML System Services Parser. This provides enhanced features – namespace processing, validation with respect to a schema • CBL XMLPARSE(COMPAT) – chooses the parser built into the COBOL library
Processing Procedure • The parser reads the document and responds to events • Gives control to the processing procedure when events occur • The processing procedure responds to the event and turns control back to the parser for further parsing
Parse Syntax Here is the syntax for the XML PARSE statement
XML PARSE • XML PARSE begins parsing, identifies the source document, and the processing procedure • Specify the ENCODING option to describe the document’s encoding • Specify the VALIDATING option to identify an XML schema against which the document will be validated
COBOL Features for Processing XML Input • Special Registers • XML-CODE - to determine the status of XML parsing. PIC S9(9) BINARY • XML-EVENT - to receive the name of the event. PIC X(30) • XML-NTEXT – to receive XML document fragments that are returned as national character data (Unicode). Variable-length alphanumeric item • XML-TEXT – to receive XML document fragments returned as aphanumeric data. Variable-length alphanumeric item
COBOL Features for Processing XML Input • Special Registers • XML-NAMESPACE - to receive a namespace identifier event, or for an element name or attribute name that is in the namespace. Variable-length alphanumeric item • XML-NNAMESPACE – national namespace. Variable-length alphanumeric item • XML-NAMESPACE-PREFIX – to receive a namespace prefix. Variable-length alphanumeric item • XML-NNAMESPACE-PREFIX – to receive a national namespace prefix. Variable-length alphanumeric item
Prior to Parsing • In order to process an XML document, the entire document must be in memory • Common sources of XML: • WebSphere MQ message • CICS Transient Queue • CICS Communications area • IMS message processing queue • Reading a file of records
Reading XML Off A File • The entire XML file must be placed in a COBOL data item • You will need: • A FILE-CONTROL entry to define the file • An OPEN statement to open the file • READ statements to read all the records into a data item in WORKING-STORAGE • Optionally, a STRING command to string all the separate records together into one continuous stream, removing extraneous blanks, and to handle variable length records
Parsing with XMLPARSE(XMLSS) XML PARSE document PROCESSING PROCEDURE event-handler-name ON EXCEPTION … NOT ON EXCEPTION … END-XML • Parsing continues until 1) an END-DOCUMENT event occurs 2) the parser signals EXCEPTION and the procedure returns to the parser with the XML-CODE register still set to 0 which indicates that no further XML data will be provided to the parser 3) you terminate processing by moving -1 to XML-CODE
Parsing • If XMLPARSE(XMLSS) is in effect, you can also use any of these optional phrases of the XML PARSE statement: • ENCODING, to specify the CCSID of the document • RETURNING NATIONAL to cause the parser to automatically convert UTF-8 or single byte characters to national characters for return to the processing procedure • VALIDATING, to cause the parser to validate the document against an XML schema
Events • For each event that occurs during parsing, the parser sets the associated event name in the XML-EVENT register and passes this to the processing procedure. • Depending on the event, other registers can also be set • Typically, XML-TEXT is set with the data that caused the event • Some typical events: START-OF-DOCUMENT START-OF-ELEMENT ATTRIBUTE-NAME END-OF-ELEMENT CONTENT-CHARACTERS START-OF-CDATA-SECTION END-OF-DOCUMENT
Parsing • The parser checks XML documents for most aspects of well formedness. • Documents can be parsed with or without validation • Validation insures that the document adheres to the content and structure described in the schema. • Validation can insure that there are no unexpected elements, no required elements are missing, and that element and attribute values are legal
Transforming XML Text to Cobol Data Items • For alphanumeric items decide if the XML data should be at the left or right end. For right justification define a field as JUSTIFIED RIGHT • You might be able to move a numeric field that is decorated by moving it to a numeric-edited Cobol field. Then move it (de-edit) to a numeric field. • Use intrinsic function NUMVAL to extract and decode simple numeric values • Use intrinsic function NUMVAL-C to extract and decode XML data that represents monetary values
Exercise #1 • Use the file BCST.SICCC01.PDSLIB(XMLDATA2) • The file structure is similar to the one below: <?xml version=”1.0” encoding=”ibm-1140” standalone=”yes”?> <batch> <trans> <name>Joe Smith</name> <amt>12.32</amt> <amt>5.42</amt> </trans> <trans <name>Tina Louise</name> <amt>8.99</amt> </trans> … </batch
Exercise #1 • Write an XML Cobol program that reads the file and copies it to memory. • Print out a report that lists each customer name and a total for each customer. • Print a grand total for the entire file Name Amount Joe Smith 17.74 Tina Louise 8.99 Grand Total 26.73
Parsing XML in Segments • Read a segment (record) from the file • Pass the record to the parser using XML PARSE to start the parser • Control flows between the parser and the processing procedure until the end of the record • At record end, the parser returns control to the processing procedure after setting XML-EVENT to END-OF-INPUT and setting XML-CODE to 0.
Parsing XML in Segments • If the processing procedure reads the next record successfully, it sets XML-CODE to 1 to signal more input, and returns to the parser to continue parsing. • This process continues until EOF when the processing procedure returns to the parser after leaving XML-CODE set to 0. • CSU.PUBLIC.XML(XMLSEG) is a good example program to use
Exercise #2 Convert(XMLDATA2) The file structure is similar to the one below: <?xml version=”1.0” encoding=”ibm-1140” standalone=”yes”?> <batch> <trans> <name>Joe Smith</name> <amt>12.32</amt> <amt>5.42</amt> </trans> <trans <name>Tina Louise</name> <amt>8.99</amt> </trans> … </batch
Exercise #2 For this exercise, rework the code you wrote in Exercise #1 Use the same input file BCST.SICCC01.PDSLIB(XMLDATA2) Instead of reading all the XML into a single area of memory, read and process the XML file one record at a time
Exception Processing • Document errors cause the parser to set an exception code in XML-CODE and to signal an XML exception event. • The exception event can be handled by the ON EXCEPTION or NOT ON EXCEPTION clause of the PARSE statement
Exception Processing • XML-CODE contains a four-byte field that is the concatenation of two two-byte fields: Return Code 2 Bytes Reason Code 2 Bytes XML-CODE 2 Bytes
Exception Processing • Cobol definition of the return code and reason code fields: 1 XML-DECODE. 2 RTN COMP PIC 9(2). 2 RSN COMP-5 PIC 9(4). • The two values combine to describe the error. Consult the IBM XML documentation for codes: http://publib.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/gxlza120/CCONTENTS
Printing the XML-CODE 1 XML-DECODE. 2 RTN COMP PIC 9(2). 2 RSN COMP-5 PIC 9(4). 1 HV PIC X(16) VALUE '0123456789ABCDEF'. DISPLAY ' RC=' RTN ',REASON=X ''' HV(FUNCTION MOD(RSN / 4096 16) + 1:1) HV(FUNCTION MOD(RSN / 256 16) + 1:1) HV(FUNCTION MOD(RSN / 16 16) + 1:1) HV(FUNCTION MOD(RSN / 1 16) + 1:1) ''''