360 likes | 487 Views
The Role of XML in Cloud Data Integration. Presenter: David RR Webber, Oracle Corporation October 15th, 2010. Introduction.
E N D
The Role of XML in Cloud Data Integration Presenter: David RR Webber, Oracle Corporation October 15th, 2010
Introduction Cloud services introduce new challenges for information sharing. While certain aspects are familiar and yet still unresolved, new techniques can be utilized. A canonical approach underpinning Cloud data exchanges is vital to ensure consistent understanding and enhanced interoperability. Developers face many challenges and complexities in using today’s industry standards for information exchanges. How can we simplify this and rapidly develop consistent and conforming information exchanges in the Cloud? Open source tools will be discussed and government and industry example exchanges presented.
Agenda Challenges, New and Old Why a canonical approach? Adaptive, agile and context aware infrastructure Avoiding the n(n-1)2 dilemma Ensuring Simplicity at the Foundation Never underestimate the ability of engineers to add complexity Open source and open standard solutions Examples from emergency management domain with NIEM* Summary and Q&A * National Information Exchange Model (NIEM) approach
Challenges, New and Old Why a Canonical Approach? Adaptive, agile and context aware infrastructure Avoiding the n(n-1)2 dilemma
Why a Canonical Approach? Traditional XML information exchange has been schema driven; many issues for Cloud integration: W3C XML Schema is inflexible, static, brittle, localized, expensive Canonical dictionaries exploit Cloud approach with distributed availability, flexible collaboration and dynamic updating and referencing Amazon Web Services AWS catalog an example of dynamic approach Canonical dictionaries provide the components that underpin the exchanges while leaving the precise exchange formatting open for implementers; the “what” not the “how”; content can change hourly on AWS! Neutral XML-based syntax is future proof
Canonical XML dictionary A collection of distinct components that represent discreet business information for an application domain Includes singleton components and combinations of related components together as sub-assemblies Information is represented in a simple neutral conceptual data format that captures the critical concepts about the data e.g. name, description, content type, contextual usage pattern, hierarchy Wikipedia definition: http://en.wikipedia.org/wiki/Canonical#Computer_science
Baking in Interoperability Using consistent component definitions dramatically improves interoperability and reuse Having formal design methods makes development faster, easier, predictable and repeatable Aligning local practice to industry domain dictionary can reduce complexity and reinforce best practices Dictionary definitions can be automatically evaluated for common mistakes and this reduces the opportunity for errors during design phase Generating software artifacts from neutral dictionary definitions ensures reliable information exchange results across user communities and their particular systems, platforms and tools
Neutral Content Model Representation Neutral representations allow business stakeholders to participate in dictionary development without technology barriers Concise neutral formats can be viewed as simple spreadsheets as they have no special syntax dependencies Based on open public standard specifications, semantic concepts and leading knowledge domain techniques Neutral representation prevents lock-in by vendor, syntax, tooling or platforms Maximizes flexibility and future proofing of dictionary definitions
Linguistic and Semantic Alignment Formal community domain naming and design rules provide consistency of definitions Consistency of definitions minimizes duplication and overlapping of dictionary components Dictionaries allow collaboration on component development to improve the overall results Formal component content detail drives alignment Design best practices ensure logical self-contained components that can be selected contextually Avoids explosion of complexity and excessive over definition (e.g. “kitchen-sink” schema)
What is a Canonical Approach? There are several flavors of canonical approaches; some more complex than others – e.g. UBL vis OAGi vis CCTS Avoid dependence on W3C XML Schema mechanisms Core Components Technical Specification (CCTS) simple components with basic hierarchy Parent components with child entities, and/or components Associated attributes that denote context and related factors In CCTS parlance these are ABIE, BBIE and ASBIE Parent = Aggregate Business Information Entity Child = Basic Business Information Entity Attribute = Association Business Information Entity
Conceptual Information Model Follows Naming and Design Rule (NDR) principles and guidelines Canonical Components Dictionary XML Each compound component Parent (ABIE) Item Parent (ABIE) Item Parent (ABIE) Item Parent (ABIE) Item . . . . . Each atomic component Attribute (ASBIE) Child (BBIE) Item Child (BBIE) Item Child (BBIE) Item Child (BBIE) Item Attribute (ASBIE) Attribute (ASBIE) ebXML CCTS terms (ABIE, BBIE, ASBIE) Parent = Aggregate Business Information Entity Child = Basic Business Information Entity Attribute = Association Business Information Entity Attribute (ASBIE) Optional attributes of component * CCTS – Core Components Technical Specification
Example – Person Name Person Name (ABIE) Language Code (ASBIE) Verified Details? (ASBIE) Has Alias? (ASBIE) First Name (BBIE) Middle Name (BBIE) Last Name (BBIE) Previous Name? (ASBIE) Language Code may exist independently of Person Name Verified Details and Previous Name are flags that denote additional information about the entity they are associated with There are three component items aspects: structure relationships; content rules; definitions Naming and Design Rules (NDR) also important in ensuring shorter non-specific context names e.g. compare PersonName to IncidentPersonName
Methods for creating Canonical Dictionary Harvest from collection of domain exchange schema Export from SQL database to schema; harvest; rename Export from modelling tool to schema; harvest; rename Create manually in XML or spreadsheet
Sample Dictionary Building Processes LEGEND Automated Manual Analyst Review Apply Naming and Design Rule (NDR) checks and edits Option 1 – From Enterprise Data Model Import XSD and refactor for use with OASIS CAM 1 3 Import EDM Model Components XSD schema OASIS CAM template NDR Evaluation, Refactor, Renaming Tool 4 Export Components in XSD syntax Collection of objects from model Analyst Review Ele Def XML 5 Generate Standard Components Dictionary XML DDL ebXML CCTS compatible (ABIE, BBIE, ASBIE) Dictionary of exchange components Option 2 – Derive from existing exchange XSD schema Import each XSD and merge into CAM dictionary 2 3 4 5 Merge & Generate Dictionary XML NDR Evaluation, Refactor, Renaming Tool CAM template Exchange XSD schema XML Import CAM template Exchange XSD schema OASIS CAM template Import Exchange XSD schema Dictionary of exchange components Import ebXML CCTS compatible (ABIE, BBIE, ASBIE)
Ensuring Simplicity at the Foundation Never underestimate the ability of engineers to add complexity
Adaptive, agile and context aware infrastructure XML validation framework that is configurable dynamically through the use of XML templates and rules. “In today's complex information exchanges with XML and associated large XSD schema, coupled with an array of trading partners, it becomes a significant challenge to support and maintain accurate handling of all incoming transactions”. “With a more adaptive and fault tolerant process, the application is able to handle a wider variation in content and, hence, more easily support a broad set of interaction partners with reduced support and maintenance costs”. http://www.ibm.com/developerworks/library/x-camval/index.html
Avoiding the n(n-1)2 dilemma New XML validation framework Automotive parts repair with STARBOD example Utilizing validation framework with singleton validation templates that are context rule driven Source: http://www.ibm.com/developerworks/library/x-camval/index.html#figure2
Agile Solution Components LEGEND Automated Manual Canonical Dictionaries Pick Components Wantlist Industry dictionary formatted as XML Review Structure Assembly 1 Agile Validation Engine Blueprint toolkit Exchange Designer Tool User Interface Domain dictionary formatted as XML Definitions Repository (XML) Build Ele Def Business Context Rules WSDL actions (optional) 2 Domain applications CAMV engine Content Hints Templates XML Schema 3 4 Test Exchange Structure Schema XML exchange realistic test examples Unit Test Harness Interchanges 19
Leveraging Cloud Deployment strengths Collaboration tools for sharing canonical component dictionaries Repositories of templates and code lists Fault tolerant deployment architectures with redundancy Machine accessible APIs to allow real time updates and propagation of changes Standards based implementations that provide open access Open source resources for shared implementation support
Open Source and Open Standard solutions Examples from Emergency Management domain with NIEM, OASIS EDXL, LEXS
Example Emergency Management Scenario Emergency Response Services Workflow using OASIS EDXL exchanges Haiti demonstrated need for agile exchanges to rapidly cope with unfamiliar scenario and environment changes Cloud-based sharing of open adaptive common infrastructure components
Top Down Solution Approach LEGEND Automated Manual Target applications 3 Pick Components Structure Outline Blueprint Industry dictionaries formatted as XML Exchange generator tools (CAM) 4 Enterprise Data Model Import and refactor for use with CAM 6 Expand Structure Exchange Structure 2 Exchange Components 1 Build EDM Exchange Blueprint Designer User Interface Local domain dictionary formatted as XML Ele 5 Exchange Package Components Definition (XML) 7 Def Ele Def DDL Dictionary Repository
Assembling Components from dictionaries Determine your business information exchange components at conceptual level Search and locate candidate components from appropriate domain dictionary collections Catalogue the parts to be used Dictionary components can be referenced individually or as collections by an assembly blueprint that puts them all together to create a complete information exchange Components can be selected from multiple dictionaries Note any new extension pieces as needed Select components from multiple physical dictionary files Blueprints themselves also have high re-use value Can be sub-assemblies and patterns not just exchange models
Example Assembly Blueprint Outlines • LEXS messaging blueprint Reusable messaging envelope constructs • OASIS EDXL HAVE message Business functional components Message handling, delivery and control Individual component Payload goes here Top level sets of business information components these examples available from CAM editor install package ~ CAMeditor\eclipse\workspace\CAMEditor\dictionary\blueprints\ LEXS – Law Enforcement eXchange System – http://www.lexs.gov
Exchange Development Process Tools Component Definitions Component Definitions Excel Domain dictionary Web tool 1 Blueprint Designer Industry dictionary 2 Search Tools Expander Tool 3 Insert Dictionary Parent Components 4 5 Completed Exchange Template
Summary and Q & A Review Resource links
Summary Canonical XML component dictionaries Neutral representation of components Deployment to target environments and architectures Collaborative development and open source Uses open public standards and government guidelines (NIEM) Available resources and tools Illustrative use cases Leverage strengths of cloud-based collaboration resources
Resources Resource links Supporting supplemental slides
Links and Resources DOWNLOADS - CAM Toolkit download https://sourceforge.net/projects/camprocessor SUPPORTING MATERIALS - NIEM Naming and Design Rules (NDR) 1.3 http://www.niem.gov/pdf/NIEM-NDR-1-3.pdf RESOURCES – UN/CEFACT Core Components Technical Specification http://www.unece.org/cefact/ebxml/CCTS_V2-01_Final.pdf Tutorials - wiki.oasis-open.org/cam/CAM_Tutorials Specifications www.oasis-open.org/committees/cam docs.oasis-open.org/cam www.oasis-open.org/committees/emergency NIEM site - www.niem.gov LEXS site – www.lexs.gov
Available XML Dictionaries LEXS 3.1.4 dictionary OASIS EDXL dictionary OASIS EML dictionary NIEM 2.1 dictionaries CBRN dictionary Emergency dictionary Family dictionary Immigration dictionary Infrastructure dictionary Intelligence dictionary Justice dictionary Maritime dictionary Screening dictionary Trade dictionary NIEM core dictionary Immigration blueprint • Packaged with CAM editor see dictionary folder of install + spreadsheet + blueprint samples XML XML XML Note: Those marked in bold are model style dictionaries with recursive components. Available from download site direct link: http://sourceforge.net/projects/camprocessor/files + includes spreadsheets and sample blueprint XML XML XML XML XML XML
Conceptual Information View Structure Rules CAM Template Definitions DICTIONARY COMPONENTS DOMAIN DATA COMPONENTS Items Item (ABIE, BBIE, ASBIE) CAM toolkit processing Properties Name Unique ID Component Type Cardinality Content Type Content Mask Children Apply tools in desktop CAM toolkit editor Group Structure Context Where from Definition Rules Language, Label, Notes * Required items in Blue
XML View of Dictionary Content Name Unique ID Component Type Cardinality Parent / Child linkage where referenced Items * See slide notes for explanation Content Type Content Mask
Excel Spreadsheet View Type (ABIE, BBIE) properties as columns children An item per row
Mapping to Dictionaries You can compare a template of components to a dictionary check within a domain for alignment to dictionary check between domains for interoperability merge new/existing components with dictionary Matches on physical names Reports matching items and details Reports statistics and percentages of matching Generates crosswalk xml file Compatible with Microsoft Excel Report can be used to do spell checking
Example cross-reference spreadsheet Formatted view in Microsoft Excel of import of cross-reference report details (from generated XML file) Matched details; item and alignment, definition