370 likes | 381 Views
Learn how the Louisiana Department of Revenue uses XML to standardize data processing in their integrated tax system.
E N D
Use of XML in LDR's Integrated Tax System Louisiana Department of Revenue Technology Conference San Antonio, TX August 13 - 16, 2000
Background LDR is currently engaged in a cooperative endeavor with IBM for a complete redesign and redevelopment of the software systems that support the administration of taxes. The system is being designed, developed and implemented using the following: • Thin pc clients (Windows NT) with applications and data residing on a mainframe server • Object-oriented analysis and design using Rational Rose design tools • Java development using Visual Age for Java • MQSeries for message handling • MQSeries WorkFlow as the workflow manager • DB2 (6.X, most current version at time of implementation) as the database on an OS/390
Challenge • The department exchanges and processes data from multiple sources in a variety of formats. • For today and in the future, the goal is to develop a system with a standardized approach for processing data created and processed in multiple formats.
XML is the the most logical solution to the challenge of developing a system with a standardized approach for processing data that is created and processed in multiple formats. Solution: XML
Reasons for choosing XML • XML is simple, straightforward and human readable • XML is platform independent • XML is programming language independent • XML is extensible and easy to maintain • Standardized interfaces (APIs) for processing XML data • Many tools exist for parsing and transforming XML data • Standardized (W3C)
Key Definitions • XML - extensible markup language is an open standard (W3C) that provides a data format and a data modeling language for defining data. • DTD - document type definition is the modeling mechanism for XML. It provides the rules for how XML data is defined and logically related. • Well-formed XML - an XML document in proper XML format, but with no structural conformance to a DTD (flat XML). • Valid XML - an XML document in proper XML format with a structural conformance to a DTD (structured XML). • XSLT - extensible stylesheet language for transforming XML documents into other XML documents.
Uses of XML at LDR • Data exchange format for forms processed by the system • External sources • Internal sources • Data exchange format for data between sub-systems • Data exchange format for legacy system data being converted into the new system • Data exchange format for data exchanged to other LDR systems
External Sources of Forms • Data entry of original forms • Scanned original forms • Others, as defined and implemented • Electronic filings • EDI • EFT • Internet • Flat files of various types of data (tape and diskette)
Original Documents from Data Entry or Scanners • All original forms (remitted by the taxpayer) are converted into an internally developed format called Universal Data Format. • The data is validated for syntactical and contextual correctness. • Data passing validation is routed further into the system for conversion into flat XML. • Data failing validation will not be processed any further within the system.
Reasons for Using UDF • Validating data in UDF format is a relatively new process that works well. • This step provides assurance that data which the system played no part in creating, is valid to the extent that it can be processed within the system. • Cost and timing factors weighed into the decision to retain this method of validation. • Future plans are to develop validation routines against data in XML format to eliminate this step.
Example of UDF Record 000001INIT0101BATHDR1234567890123451998-12-31-00.00.00.00000000560013001000010001300200003000130030000400016004072894263 1 | 2| 3| 4| 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13| 14 | 15 | 16| 17 | 18| 19| 20
Example of UDF Record (cont.) Where, 1.) 000001 – is the record identifier attribute value of the header 2.) INIT – is the record type attribute value of the batch header 3.) 01 – is the segment number attribute value of the header 4.) 01 – is the total number of the segments attribute value of the header 5). BATHDR – is the document type attribute value of the header 6.) 123456789012345 – is the batch identifier attribute value of the header, for illustration purposes only. 7.) 1998-12-31-00.00.00.000000 – is the processing date attribute value of the header 8.) 0056 – is the length of the variable portion of the record. 9.) 0013 – is the parameter length (parameter included in the total) of the “Number of Returns in the Batch” parameter 10.) 0010 – is the code identifier of the “Number of Returns in the Batch” parameter 11.) 00030 – is the value of the “Number of Returns in the Batch” parameter . . .
Conversion of UDF to XML • UDF format records are converted to flat (well-formed) XML. The flat form of the record is simply a mapping out of the UDF data in XML format. • The flat XML records are transformed using XSLT into structured (valid) XML. XSLT expects, at least, a well-formed document for transforming.
Reasons for Two Phased Conversion • Allows a very simple format of flat XML data to be created by external systems for conversion, as required, to structured XML. • Flat XML is a better format to receive data from external sources into the system for conversion because that single format of data is simple and can be transformed into many different structured versions of the record by transforming the data with multiple DTDs using XSLT.
Example of Flat XML Document <?xml version="1.0"?> <form> <field id="1000">2003</field> <field id="1010">1234567891</field> <field id="1015">333333333</field> <field id="1017">233300</field> <field id="1040">19991231</field> <field id="1050">CITM</field> <field id="1055">20000522</field> <field id="1060">20000526</field> <field id="1105">MAIL</field> <field id="1125">N</field> <field id="1130">N</field> <field id="1135">N</field> <field id="1140">N</field> . . . </form>
Example of DTD <?xml encoding='UTF-8' ?> <!-- edited with XML Spy v3.0 NT (http://www.xmlspy.com) --> <!-- STARTER FILE CONTAINING ALL TAX FORM XML Includes: Global Definitions LDR Form CFT4 LDR Form CIFT620 LDR Form IT620ES Revision: DRAFT Date: May 30, 2000 TBD: refine definitions for entities with strict formats? --> <!-- ENTITIES --> <!-- TBD: promote appropriate constructs to entities as needed --> <!-- ELEMENTS --> <!-- --> <!-- Generics --> <!-- -->
Example of DTD (cont.) <!-- Identification Numbers --> <!ELEMENT LRAN (#PCDATA )> <!-- Louisiana Revenue Account Number --> <!ELEMENT FEIN (#PCDATA )> <!-- Federal Employer Identification Number --> <!ELEMENT BusinessCodeNumber (#PCDATA )> <!-- Dates --> <!ELEMENT YearMonthDay (#PCDATA )> <!ELEMENT YearMonth (#PCDATA )> <!ELEMENT Year (#PCDATA )> <!ELEMENT DateIssued (#PCDATA )> <!-- Periods --> <!ELEMENT Period (PeriodStart? , PeriodEnd )> <!ELEMENT PeriodStart (#PCDATA )> <!ELEMENT PeriodEnd (#PCDATA )> <!-- Names --> <!ELEMENT BusinessName (#PCDATA )> <!ELEMENT PersonName (#PCDATA )>
Example of DTD (cont.) <!-- Addresses --> <!ELEMENT MailingAddress (Street , StateOrProvince , Country? , ZipOrPostalCode )> <!ELEMENT Street (#PCDATA )> <!ELEMENT StateOrProvinceOfIncorporation (#PCDATA )> <!ELEMENT StateOrProvince (#PCDATA )> <!ELEMENT ZipOrPostalCode (#PCDATA )> <!ELEMENT Country (#PCDATA )> <!-- Telephone Numbers --> <!ELEMENT Telephone (#PCDATA )> . . .
Example of a Structured XML Document <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE TaxFormLDRCIFT620 (View Source for full doctype...)> <TaxFormLDRCIFT620> <FormHeader formName="CIFT620" /> <BasicBusinessInfo> <LRAN>1234567891</LRAN> <FEIN>333333333</FEIN> <BusinessCodeNumber>233300</BusinessCodeNumber> </BasicBusinessInfo> <Period> <PeriodEnd>19991231</PeriodEnd> </Period> . . . </TaxFormLDRCIFT620>
Document Renderer • Developed for the specific purpose of converting form data from other formats into XML and XML to other data formats for processing within the system. • Enables transforming between XML and Java objects for processing and efficient storage of data. • Data is rendered using a SAX (simple api for XML) compliant parser. SAX is a simple API for parsing XML documents and almost all parsers support it.
Processing of Form Data using the Document Renderer • A file of structured XML forms that are ready to be validated are read by a form validation application. • The application will rely on the document renderer to convert the structured XML version of the form to the corresponding domain objects required for validation. • The validation rule engine will validate the form for correctness within the context of the taxpayer’s registration and accounting profile. • The validated data in the domain objects is persisted in the underlying database.
Internal Sources of Forms • Internal forms originating from a GUI • Internal forms generated from system processes
Internal Forms Originating from a GUI • Users key data utilizing a GUI. The resulting raw data is passed on for transformation. • The raw data is converted to an XML version of a form and passed on for rendering. • The XML version of the form is rendered into the domain objects for validation. • Once validation is complete, the data is either persisted in the database or routed back to the presentation layer for correction.
Internal Forms Generated from System Processes • Processes within the system may independently determine that there is a need for adjustments to the taxpayer’s account. • All accounting data emanates from a form. • A request for the creation of a form to initiate the creation of the adjusting accounting data must be made.
Legacy Data Conversion • Taxpayer registration data must be migrated from the legacy system into the new system to populate the database. • A COBOL program was written to extract legacy registration data and create a file of structured XML records. • The structured XML records are parsed into domain objects and the data contained in the domain objects is persisted in the underlying database.
Data Exchanged to Other LDR Systems • A data warehouse application will require input data from the main system database. • The data needs to be reformatted to assume a meaningful context in the warehouse applications.
Lessons Learned • Keep things simple • Carefully plan, design and develop the DTD using clear and descriptive comments in its definition • Many XML manipulation tools are available and many more are on the way! • Keep abreast of advancement in technology • XML Schema • XML Binding • Others
Sites of Interest • www.alphaworks.ibm.com • www.apache.org • xml.com • xml.org • www.ebxml.org • www.oasis-open.org • www.w3c.org • Java.sun.com/xml
Conclusion • What’s the big deal? There’s nothing magic going on here. • XML simply serves as a means of exchanging and transforming data. • With a standardized format for data exchange and open source initiatives for software to transform this data, the bulk of the design and development effort can be targeted toward the logic of business processing.
Contact Information Barry Aucoin Louisiana Department of Revenue LDR Information Services Division email: baucoin@rev.state.la.us phone: (225) 925-4220 fax: (225) 922-0850