280 likes | 371 Views
OASIS SET TC Automating Intra-domain Mappings. Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations. David Webber OASIS SET TC / CAM TC (with excerpts from iSURF presentation by Prof. Dr. Asuman Dogac, METU-SRDC, Turkey ). Agenda.
E N D
OASIS SET TC Automating Intra-domain Mappings Leveraging SET, OWL, CAM and Dictionary based tools to enabled automated cross-dictionary domain translations David Webber OASIS SET TC / CAM TC (with excerpts from iSURF presentation by Prof. Dr. Asuman Dogac, METU-SRDC, Turkey)
Agenda • Part I: Introduction – • Intra-domain example use cases • Challenges and Opportunities • Part II: Roadmap – • CAM templates, OWL, XPath, Dictionaries, CCTS • Using Dictionary based approach and SET Tools for aligning structure components across syntax vocabularies within a domain • Part III: Summary – • Next Steps
Information Exchange Interoperability • Many common domains are using multiple vocabularies that have arisen historically over time – e.g. banking, healthcare, supply chain1. • These may be weakly or strongly aligned depending on the domain and fragmentation / marketplaces within it • All domains share common components such as organisation, person, customer, vehicle, address. 1 – X12/EDI, UN/CEFACT, UBL, GS1, xCBL, cXML, FIX, SWIFT, HL7, more…
Dictionary alignment task challenges • Each domain can be inspected by comparing the vocabulary dictionaries • Creating dictionaries in a common reference format has previously been complex and manual intensive process • Even within a domain implementation the vocabulary maybe fragmented and inconsistent because information models evolve over time
Opportunities and Potential • Creating an agnostic set of methods and tools that allow alignment within a domain to facilitate consistent information definitions • Leverage the approach to also support semi or fully automated mapping patterns and templates • Use open standards and open source tools • Provide open public roadmap for tool vendors • Allow standards groups to publish their exchanges in an open non-proprietary syntax and rule system • Enable SMBs to build once, exchange to many
Part II: Roadmap – CAM templates, OWL, XPath Dictionaries and CCTS
CAM templates, OWL and dictionaries • Information components derive their meaning and semantics from the context of their use pattern, not the physical name label, e.g. • Customer/Account/Number • Order/Item/Number • CAM templates and OWL terms share ability to express use patterns that can be inspected and equivalence deduced using software agents that traverse the exchange structure components • Matching is based on rules that can be tailored and reference to dictionaries of known properties • Allows automated generation of domain dictionaries
CAM templates, XPath and dictionaries • CAM toolkit contains dictionary analysis tools that can: • Create a new dictionary from existing domain exchange transactions • Merge dictionaries together • Compare exchange transactions to dictionary definitions and produce spreadsheet of matches and deltas • Report XPath location usage patterns of all unique items and exchange transactions • Assign unique UID values to each component
CAM dictionary generation overview XSLT script XSLT script XSD schemas CAM Templates 2 1 Compare & Merge 3 Components: Name Description Type Restrictions Relationships Usage occurrences Master Dictionary UID
Dictionary Tools • Generate a dictionary of core components from a set of exchange templates • Separate dictionary content by namespace • Merges annotations and type definitions from exchange template into dictionary • Compare each exchange template to the master domain dictionary • Produce spreadsheet workbooks • Update spreadsheet and export back to dictionary core components
Create Dictionary – CAM process Select Dictionary; empty for new create, or existing for merge Output dictionary filename Select template content namespace to match with Merge mode; use true to combine content
Compare to Dictionary Pick dictionary to compare with Name of result cross-reference file
CAM template to OWL exporter • Currently CAM toolkit contains a variety of exporter tools into XSD schema, XML dictionary and XML test case example generation • Opportunity to write exporter that generates OWL terms directly from CAM template patterns in dictionary • Using XSLT to accomplish this, so can be easily adapted, extended and tailored • Allows OWL-based reasoner to act with CAM • Reasoner can also then update CAM dictionary to complete the semantic mapping
CAM to OWL generation overview XSLT script Master Dictionary 1 Extract and Generate UID UID 4 2 Insert UID couplet pairings Reasoner Components: Name Description Type Restrictions Relationships OWL terms instances 3 UID
Explicate semantics related with the different usages of document data types • Different document standards use CCTS Data Types differently • For example, “Code.Type" in one standard is represented by “Text.Type" in another standard and yet with “Identifier.Type" in another standard • This knowledge in real world is expressed through class equivalences so that not only the humans but also the reasoner knows about it • Code.Type ≡ Text.Type • Name.Type ≡ Text.Type • Identifier.Type ≡ Text.Type • Can cross-reference via UID as well as type
Dictionaries, UIDs, and CAM templates • Within a dictionary each unique context of an item can be assigned a UID label value • These UID label values can then be inserted as references into a CAM template • Each UID couplet across exchange formats within a domain can be marked as equivalent or similar (aliases) • This allows automated mapping across CAM template definitions • For similar items, CAM supports transform rules in standard XPath syntax
Dictionary Alignment Step • Human / OWL inspectors • Dictionary alignment report produces known equivalents listing (confidence 100%), and then lesser equivalence rankings based on matching factors • Component compound relationships resolved using CAM template structure layouts • Human inspection then reviews and resolves and updates dictionary (using Excel spreadsheet workbook format) • New dictionary produced • Iterative refinement over time can enhance alignment along with common practices through industry agreements
From Dictionary to Runtime Mapping • Once dictionary is available with UID couplets for domain crosswalks – proceed to align • Take templates of actual exchanges – and label these with UID couplets • Lookup UID couplets in dictionary and update target template with UID from couplet • Take completed templates – use to drive actual mapping processes
Create UID driven mapping template Domain Master Dictionary Same, or Similar CAM template (target) Lookup UID couplet UID UID + optional XPath mapping rule 2 CAM template (source) 1 XSLT script 3 Updated CAM template (target) UIDs UIDs Rules
Automated UID driven mapping Rules UIDs CAM template (target) 1 3 Input XML instance XSLT script Output XML instance 2 4 CAM template (source) UIDs
Dictionary approach summary • If the document components of two different domain standards share the same semantic properties: • Use this as an indication that they may be similar • Some explicitly defined semantic properties may imply further implicit semantic relationships: • Use a reasoner to obtain implicit relationships • Align to dictionary definitions allowing crosswalk • Create harmonized dictionary lookup • Use abstract UID as common reference (linkage between language specific named types/objects) • Explicate semantics related with the different usages of document data types in different document schemas to obtain some desired interpretations by means of such informal semantics • Determine similar/match relationships and rules for constraint alignment and compound component relationships (e.g. date-time vice date and time) • Provide dictionary structure format for managing relationships • Leverage existing OASIS CAM and ebXML Registry TC work
Summary • Develop crosswalks: • Convert XSD schema to CAM templates • Leverage template structure and XPath rules to build dictionaries with UID labels • Build OWL relationships from dictionaries • Compare each dictionary to master dictionary and reference OWL and type knowledge bases to align • Produce spreadsheet for manual review • Save final results back to master dictionary • Build runtime templates: • Compare individual CAM templates to master dictionary, generate cross-walk section between components • Cross-walk can contain alignment rules in XPath for content handling (e.g. code values and re-formatting)
Tools needed • CAM • Schema ingesting • Dictionary builder • OWL • Reasoner • CAM dictionary to OWL generator • Extend CAM dictionary format for couplets / rules • Extend reasoner to update dictionary couplets • Mapping • XSLT engine to read input, templates and create output • (Can use existing XSLT CAM validator as basis)
The above equivalences are labelled as couplets through the UID dictionary cross-references and can be stored back into CAM templates <Extensions> section for runtime crosswalk use.