440 likes | 554 Views
DFAS XML Best Practices Version 1.0. Defense Finance and Accounting Service. Introduction. The best practices contained herein have been realized during two years of research and development efforts by the DFAS Data Architecture (DFAS-DTB) XML team
E N D
DFAS XML Best PracticesVersion 1.0 Defense Finance and Accounting Service
Introduction • The best practices contained herein • have been realized during two years of research and development efforts by the DFAS Data Architecture (DFAS-DTB) XML team • are the result of a learning by trial-and-error approach • have collaborated with other XML best practices developed by government and industry groups • are not policies
DFAS XML Team • Mike Lubash, DFAS Data Architecture • Nauman Malik, XMLCG • Bruce Peat, eProcessSolutions • Kit Lueder, Mitre • Charlie Clark, EM&I
Short-Term versus Long-Term • Problem: What is the best way to manage deployment of XML at the enterprise level? • Solution: Develop and deploy over time, with short-term and long-term solutions (as explained on the next slide) • Consequences: • Pros • Effective approach for an evolutionary process • Cons • Limitations will exist in the short-term solutions (addressed in long-term) • Registry will cost money and require a mature infrastructure in place (which currently doesn’t exist)
XML Validation Levels • Problem: At what levels should XML documents be checked? • Solution: Four levels of XML checking (as explained on the next slide) • Business requirements of each organization will dictate level of checking • Consequences: • Pros • Business requirements dictate the level of checking • Business requirements dictate the resources allocated to checking • Cons • Potential for errors exist at lower levels of checking
Level Name Description Addl Resources Req’d 0 Simple Single or Development Testing: Use editors and receiver feedback for assurance Production: After development testing, no production runtime checks are in place None (development tools, such as XMLSpy, editor, etc.) 1 Well-formed Production: Check for well-formed XML Parser Staging or Middleware 2 Validation Production: Validate document with XML Schema (in memory requirement). For further ‘If-then’ checking stylesheet. For middleware solution rules are in proprietary format. Rest of Best Practices are based on this level. Parser (and XSL engine for addl checking) or Middleware 3 Business Checking Production: Checking large or dynamic sources, such as tables, complex element relationships, switching of maps/checks based on trading partner (or if large transactions) Middleware (or Application) XML Validation Levels
Why XML Schema • Problem: How do we validate data, create and enforce structure, communicate and collaborate with trading partners, capture basic metadata, etc.? • Solution: W3C’s XML Schema • Business centric methodology dictates the use of open, non-proprietary standards • Consequences: Next Slide • Alternatives: • Not recommended by W3C • DTD • RELAX • Schematron • SOX • XDR
Pros: Requirements met by XML Schema • As DFAS’s preferred mechanism for managing our information assets (information resource), XML Schema is used: • to use open standards, such as those from the W3C • to validate data • to establish and communicate our XML accounting business vocabulary and model • to establish a mechanism of collaboration • to create reusable components (via datatypes) for heterogeneous environments spanning multiple trading partners • to encapsulate document structure • to capture structure, optionality, cardinality, enumerated code lists, etc. • to aid precise communication among our technical, functional and customer stakeholders to deliver value to our customers • in the short term, to capture basic metadata
Cons: Limitations of and supplements to XML Schema • Limitations of and requirements unmet with XML Schema • Forces early commitment to tag names • Does not allow IF-THEN logic • Does not allow extensions of enumerated lists • Does not allow value pairing (multi-fields), e.g. 21 = Dept. of Army • Lacks formal mechanisms for defining • metadata • business rules • context • constraints • code lists • Therefore, in the long-term, XML Schema will be supplemented with possible additional mechanisms such as: • Business centric methodology • AssemblyDocs / OASIS TC - Content Assembly Mechanism (CAM) • A registry
UID • Problem: How do we uniquely identify DFAS XML artifacts? • Solution: UIDs • Our UID will be in the form [discussion on next slide] • [Steward].[ArtifactName].[Version].[FileType] • For example • DFAS.USSGLAccountType.2002-12-17.xsd • UID = file name • Not specified for instance documents • Consequences: • Pros • Business friendly and technically identifiable • Cons • This approach to constructing UIDs is uncommon in industry; random generation is more common • There are competing methods for creating UIDs • But they do not all aid business communication (technical implementations)
UID Components • [Steward] • Registration authority that controls the UID to assure there are no conflicts • For artifacts produced at DFAS, <Steward> is simply set to DFAS • Reference <dc:publisher> in Dublin Core Element Set v1.1 • [ArtifactName] • Name of the “quasi” root, for example, USSGLAccountType • [Version] • Date of creation or last modification, for example, 2002-12-17 • Reference <dc:date> in Dublin Core Element Set v1.1 • [FileType] • Internet Media or Mime types, for example, xml, xsl, xsd, dtd, etc. • Reference <dc:format> in Dublin Core Element Set v1.1
The Role of Files • Problem: What is the optimal vehicle and/or storage medium for XML artifacts? • Solution: Physical flat files, each of which contains exactly one “quasi” root XML artifact and possibly other dependent XML artifacts • The UID of the “quasi” root artifact equals the filename • Consequences: • Pros • ease of configuration management • discreteness • simple design • compactness of size • allows correspondence between filename and UID; efficient cross-referencing mechanism • Cons • many files to manage • inclusion list in assemblies and/or transaction schemas can get long • tools don’t easily generate documentation for multiple discrete files easily
Types of XML Schema Artifacts • Problem: What are the various types of XML Schema artifacts? • Solution: Following types (which are also valid values for the EMS element <dc:type>): • SelfContained • Artifacts that are not dependent on the import or inclusion of any external resources • CodeList • Artifacts that contain a code list of domain values in enumerated list form • Do not contain a version in their file name; otherwise, updating could cause a chain reaction of failures in including artifacts. • Assembly • Comprised of one or more artifacts, to include SelfContaineds, CodeLists, and other Assemblies • Transaction • Defines the root element of an XML instance document • Includes and/or imports one or more artifacts, to include SelfContaineds, CodeLists, Assemblies, and other Transactions
Types of XML Schema Artifacts (continued) • Consequences: • Pros • Easy identification and grouping of type of artifact • Delineates roles for personnel responsible for developing artifacts • Cons • Incomplete list based on current requirements (updating expected in future) • No precedence available for this type of work • Choices of types can be considered arbitrary by outside bodies
elementFormDefault • Problem: What should the value of the XML Schema attribute elementFormDefault be set to in transaction schemas? • Solution: • elementFormDefault=“qualified” • This will necessitate that instance documents contain elements that are qualified via prefixes or default namespaces (see discussion on namespaces in later slides) • This attribute is not to be used in non-transaction schema artifacts • Consequences: • Pros • Identifies elements contained inside DFAS instance documents as DFAS-owned • Cons • Bulk added to instance documents by added prefixes, unless DFAS namespaces is defaulted (recommended approach whenever possible)
attributeFormDefault • Problem: What should the value of the XML Schema attribute attributeFormDefault be set to in transaction schemas? • Solution: • This attribute should not be included in any schemas • By inaction, the default value (attributeFormDefault=“unqualified”) will be chosen, which is the desired result • Consequences: • Pros • Alleviates developers from having to concern themselves with this attribute, implications of which are minimal to none • Cons • None
Exampleof file usage and inclusion <?xml version=''1.0'' encoding=''UTF-8''?> <xs:schematargetNamespace=''http://www.dfas.mil/DFAS'' xmlns:xs=''http://www.w3.org/2001/XMLSchema''> <xs:simpleTypename=''CustodialNoncustodialCodeType"> <xs:restrictionbase="xs:string"> <xs:enumeration value="S"/> <xs:enumerationvalue="A"/> </xs:restriction> </xs:simpleType> </xs:schema> CodeList Artifact DFAS.CustodialNoncustodialCodeType.2002-08-27.xsd <?xml version="1.0" encoding="UTF-8"?> <xs:schema targetNamespace="http://www.dfas.mil/DFAS" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.dfas.mil/DFAS"> <!-- Begin import of reusable components --> <xs:include schemaLocation="DFAS.BudgetSubfunctionCodeType.2002-08-27.xsd"/> <xs:include schemaLocation="DFAS.CustodialNoncustodialCodeType.2002-08-27.xsd"/> <xs:include schemaLocation="DFAS.TradingPartnerCodeType.2002-08-27.xsd"/> <!-- End import of reusable components --> <xs:element name="FactsATB"> <xs:complexType> <xs:sequence> <xs:element name="ATBAccountDetails" maxOccurs="unbounded"> <xs:complexType> <xs:all> <xs:element name="CustodialNoncustodialCode" type="CustodialNoncustodialCodeType" minOccurs="0"/> <xs:element name="BudgetSubfunctionCode" type="BudgetSubfunctionCodeType" minOccurs="0"/> <xs:element name="TradingPartnerCode" type="TradingPartnerCodeType" minOccurs="0"/> </xs:all> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> DFAS.ATB.2002-08-27.xsd Transaction Schema
Attributes versus Elements • Problem: What are guidelines to help developers determine whether to use elements or attributes to store data? • Solution: Keep the guidelines simple (rules of thumb) and avoid making official policy in this area • Key guideline: The number of attributes SHOULD be minimized • Some additional guidelines are: • Attributes can be used to provide additional metadata required to better understand the business value of an element • Attributes can only be used to describe information units that cannot or will not be further extended, or subdivided (elements are better suited for this) • Consequences: • Pros • Assistance provided in an area of potential confusion for XML developers • Cons • Can potentially limit developer creativity
Reusable XML Schema Artifacts:Elements, Named Datatypes, Named Groups • Problem: What are the guidelines to help developers determine whether to use elements, named datatypes, or named groups when creating XML Schema artifacts? • Solution: Keep the guidelines simple (rules of thumb) and to avoid making policy in this area • Guidelines follow on next 2 slides • Consequences: • Pros • Assistance provided in an area of potential confusion for XML developers • Cons • Can potentially limit developer creativity
Reusable XML Schema Artifacts: Guidelines • If the hiding of the namespace of elements in instance documents is important, use named datatypes • Named datatypes may be ‘instantiated’ in the form of either elements or attributes • Element versus attribute decision can be delayed • If it is important to not have the container element show up in the instance document, use named groups • When in doubt, make it a named datatype • Growing industry trend (X12, HR-XML, OASIS, etc.) • Limit the number of successive derivations of a named datatype (by extension or restriction)
Reusable XML Schema Artifacts: Guidelines (continued) • An alternative to named datatypes and named groups is elements that bind to anonymous types • The elements can then be referenced using the xs:ref attribute • This approach is increasingly against industry convention • Pros and cons discussion can be referenced at this website: http://www.xfront.org (Best Practices) • Design Approaches (Roger Costello, Mitre : http://www.xfront.org) • Russian Doll, Salami Slice and Venetian Blind* • Simplicity via simpleType • The Embedded Metadata Section has largely eliminated the need for many attributes • For example: UID, version, DoDClassWord • However, the ability is lost to extract information from attributes using standard APIs such as SAX/DOM; RDF APIs can make up for that loss • For SelfContaineds, complexTypes are rarely called for (unless a depth of hierarchy is needed, which is usually the case with Assemblies) * DFAS’ preferred choice - Declare named datatypes and bind elements to them as needed
Substitution Groups • Problem: How do we implement generic structures that allow for interchangeable SelfContaineds and Assemblies? • Solution: Substitution Groups • Consequences: • Pros • Can provide “plug-n-play” capability for reusable components • Support in some industry implementations • Cons • Not to be used for aliases • Wide use for this purpose is a key indicator that the business semantic issues are not being addressed and a technical workaround has been pursued • Inside <all> model groups, substitution groups can cause problems • Since maxOccurs=“1” automatically inside <all> groups, if both the element and its substitute needs to be used, eg, DepartmentCode and TradingPartnerCode, a validation error will result
Namespaces • Problem: What are the issues surrounding XML namespaces and what guidelines should be followed? • Solution: • Conflict resolution and collision of names is best handled by business adjudication and not technical workarounds • Guidelines regarding namespaces are on the next 2 slides • Consequences: • Pros • Namespaces allow for collections of XML Schema components • Namespaces allow for disambiguation among XML Schema components • Namespaces allow for easy identification of collections of XML Schema components • Cons • Internally to the organization, proliferation of namespaces can encourage “stove-pipe” mentality instead of collaborative development • Namespaces can be cryptic and difficult to understand • Namespaces are presently not handled consistently by XML parsers
Guidelines: Namespaces(targetNamespace) • We encourage that all XML Schemas specific to an organization have the same targetNamespace • ‘http://www.dfas.mil/DFAS’ is the namespace that is to be used for all DFAS XML Schema artifacts • Situations where targetNamespace of DFAS artifacts is not ‘http://www.dfas.mil/DFAS’: • If the artifact is promoted to the Enterprise namespace in the DoD XML Registry • If the artifact is established as a general artifact (not specific to any particular domain)
Guidelines: Namespaces(default namespaces) • Use of the default namespace is discouraged • if used, however, it is recommended that it be set to the targetNamespace • In general, a default namespace in XML Schemas can potentially cause problems when including schemas have a different default namespace than the included schemas; therefore, strong caution is advised • The default namespace should not be set to the XML Schema specification’s namespace (i.e. http://www.w3.org/2001/XMLSchema*) • If this guideline is ignored, problems with collisions can arise if the including XSD’s targetNamespace overrides the included XSD’s default namespace (which is set to XML Schema’s); the composite document will then not parse * - the recommended prefix for the XML Schema namespace is ‘xs’, as in xmlns:xs=“http://www.w3.org/2001/XMLSchema”
Guidelines: Namespaces(general) • Declare all namespace on the root element • Do not use more than one prefix per namespace per XML document • Inside instance documents: • All elements should be qualified via a default or prefixed namespace • Default namespaces do not pose the threat inside instance documents as they do in schemas, so their use is left to the discretion of the developer • That namespace will be ‘http://www.dfas.mil/DFAS’ to correlate with the targetNamespace of all DFAS transaction schemas • For example: • <FactsATBxmlns="http://www.dfas.mil/DFAS" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dfas.mil/DFAS DFAS.FactsATB.2002-11-22.xsd">
Configuration Management: Artifact Versioning • Problem: How should artifacts be versioned? • Solution: • Version numbers of artifacts will be based on the creation (or last modification) date and will be in the form YYYY-MM-DD • For multiple releases in one day, a lower-cased letter will be appended in alphabetical order (handles up to 27 releases on any given day) • For example: • 1st release: 2002-09-09, 2nd release: 2002-09-09a, 3rd release: 2002-09-09b, etc. • The version number will be placed both • inside the physical filename of the artifact (does not apply to CodeLists) • and inside the Embedded Metadata Section of the containing XSD in an element called <dc:date> • Consequences: • Pros • The date readily and simply identifies the version of the artifact • Scheme allows for each artifact to be versioned sperately • Cons • Doesn’t seem to be a commonplace method of versioning • It may be hard to keep track of the version of each different low-level element
Configuration Management: Instance Document and Schemas • Problem: How should instance documents be associated with schemas? • Solution: • Instance documents are versioned by their schemas • The structure of the instance documents is versioned, not their content • One of the following options is chosen • (1) Validation option; a schema URL is placed in the instance document; parser will invoke validation and linkage is documented • Examples: • <FactsATB xmlns=“http://www.dfas.mil/DFAS” xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=”http://www.dfas.mil/DFAS DFAS.USSGLAccountType.2002-08-13.xsd”> • <FactsATB xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="DFAS.USSGLAccountType.2002-08-13.xsd”> • (2) No validation option (but XSD available); an attribute called ‘citation’ which contains the filename and the location of the instance document's corresponding schema; validation will not be invoked by the parser, however, linkage is documented • Example • <FactsATB citation="http://www.disa.mil/DFAS.USSGLAccountType.2002-08-13.xsd”> • (3) Non XML Schema option; the version of the alternate schema method is cited • Example: • <FactsATB citation=“2002-08-13”>
Configuration Management: Instance Document and Schemas (continued) • Consequences: • Pros • A creative approach to versioning instance documents - ties them to their respective schemas • Cons • Doesn’t seem to be a commonplace method of versioning
Naming of Tags • Problem: How should XML tags be named and what are the surrounding issues? • Solution: • Guidelines on the next 5 slides • Consequences: • Pros • Consistent approach to naming tags • Cons • Difficult to enforce guidelines
Guidelines: Naming of Tags • All XML tag names should fully exploit the inherent hierarchical structure of XML, thus reducing redundancy of terms in the tag name and allowing for tag reuse. For example: <USSGLDetails> <AccountNumberCode>1110</AccountNumberCode> <DebitCreditCode>D</DebitCreditCode> <CustodialNoncustodialCode>A</CustodialNoncustodialCode> <FederalNonfederalCode>N</FederalNonfederalCode > <Amount>21159598.29</Amount> </USSGLDetails> is preferred over: <USSGL> <USSGLAccountNumberCode>1110</USSGLAccountNumberCode> <USSGLDebitCreditCode>D</USSGLDebitCreditCode> <USSGLCustodialNoncustodialCode>A</USSGLCustodialNoncustodialCode> <USSGLFederalNonfederalCode>N</USSGLFederalNonfederalCode > <USSGLAmount>21159598.29</USSGLAmount> </USSGL>
Guidelines: Naming of Tags (continued) • All XML tag names should align with commonly used business terms, including: • Registration of business acronyms prior to use in accordance with DFAS Extensible Markup Language Registration Policy, e.g. DFAS = Defense Finance and Accounting Service, DoD = Department of Defense. • Use of abbreviations as registered, e.g. if “Dept” was registered as a short business term for department, then <Dept/> is preferred over <Department/> • The tag name shall be in singular form unless the word exists in plural form only. E.g. for singular: <Account/>, not <Accounts>, for plural: <Scissors/>
Guidelines: Naming of Tags (continued) • For collections of the same item, the tag name must end with ‘List’. <USSGLAccountNumberList> for a generic listing of accounts. For example: <USSGLAccountNumberList> <USSGLAccountNumber>1010</USSGLAccountNumber> <USSGLAccountNumber>1110</USSGLAccountNumber> <USSGLAccountNumber>1310</USSGLAccountNumber> <USSGLAccountNumber>1520</USSGLAccountNumber> </USSGLAccountNumberList > • Exception: If the collection is properly named and has a specific, registered business meaning, e.g. United States Standard General Ledger Chart of Accounts, then use <USSGLChartOfAccounts/> instead of <AccountList/>.
Guidelines: Naming of Tags (continued) • In order to enforce a consistent capitalization and naming convention across all newly created DFAS XML, "Upper Camel Case" (UCC) and "Lower Camel Case" (LCC) capitalization styles is preferred. UCC style capitalizes the first character of each word and compounds the name. LCC style capitalizes the first character of each word except the first word. To date, there exists no public standard for this convention. These rules do not apply to XML created at DFAS prior to the creation of this guideline nor does it demand modification of externally created XML, such as industry consortia XML, for example, HR-XML, XBRL, etc. • It is preferred that XML element names use the UCC convention, for example: <AnnualReport>). • It is preferred that XML attribute names use the LCC convention, for example: <AnnualReport fiscalYear=“2001”> • It is preferred that XML named datatypes use the UCC convention, for example: <xs:complexType name=“FiscalYearType”> • It is preferred that XML named groups use the UCC convention, for example: <xs:group name=“FACTSAccountsGroup”>
Guidelines: Naming of Tags (continued) • Where acronyms are used, the capitalization shall remain for elements and attributes, for example: <DFASGuidelines/>. • Note that this is an exception to the previously discussed rule concerning word boundaries in UCC and LCC • Underscore (_), periods (.) and dashes (-) should not be used for word boundaries • Don't use: <Header.Manifest/>, <Stock_Quote_5/>, <Commercial-Transaction/> • Use: <HeaderManifest/>, <StockQuote5/>, <CommercialTransaction/> instead • Tag names should be concise but not at the expense of expressiveness.
Naming of Datatypes and Groups • Problem: How should datatypes and groups be named? • Solution: • The name of the datatype will end in ‘Type’ (even if the business term ends in ‘Type’) • For example: USSGLAccountType • The name of the group will end in ‘Group’ (even if the business term ends in ‘Group’) • For example: USSGLAccountGroup • Consequences: • Pros • Consistent approach to naming • Allows for disambiguation from other major XML Schema components such as elements, attributes, datatypes, groups, etc. • Leads to ease of recognition and identification • Cons • Difficult to enforce guidelines
Multi-field Approaches(XML Schemas) • Problem: How should multi-fields be handled in XML Schemas? • Solution: • Short-term approach • Make use of XML Schema enumeration mechanism to capture code lists • Use Dublin Core / RDF metadata for capturing relationships or mappings to other code list values • Long-term approach • Involves the use of a registry • Consequences (of short-term approach): • Pros • Makes use of currently available technology • Cons • Can easily get out of date; configuration management issues
Multi-field Approaches(Instance Documents) • Problem: How should multi-fields be handled in instance documents? • Solution: • Preferred approach: <Organization> <IDNumber code=“34”IDSource=“DNB”>10-495-9618</IDNumber> </Organization> • Consequences: • Pros • Makes use of currently available technology • Cons • Relationship is artificially tied between element, attributes, and content
Multi-field Approaches(Instance Documents) (continued) • Alternatives approaches: <Organization> <DNB>10-495-9618</DNB> </Organization> or <Organization> <IDCode>34</IDCode> <IDSource>DNB</IDSource> <IDNumber>10-495-9618</IDNumber> </Organization>
XML Schema Content Models • Problem: What are the recommendations concerning XML Schema content models? • Solution: • Recommendations for content models • Mixed • No for data • Yes for documents • Any - Yes (future expansion usage) • Trading Partner data specific • Recursive - Use with caution • Consequences: • Pros • Mixed, Any, and Recursive content models allow for specification of numerous data structures • Cons • All 3 can potentially lead to data management nightmares; caution is advised, especially for recursive models
Default / Fixed Values • Problem: What are the recommendations concerning default and fixed values for element and attributes? • Solution: • Recommendations • Use as needed • Consequences: • Pros • Can be used for elements or attributes • Can potentially simplify the instance document by shifting burden to its schema • Cons • Fixed values can be likened to hard-coding data values, a practice unpopular in software engineering
DoD XML Registry:Suggested Refinements • Creation of “Named Datatypes” and “Named Groups” categories • Searchable aliases • For ease of reuse, XSD format should be used for • elements • attributes • code lists / domain values • Configuration Management
Thank you! mike.lubash@dfas.mil amalik@xmlcg.com kit@mitre.org peat@erols.com charlie.clark@dfas.mil