260 likes | 422 Views
Semantic Integration Layer for The As-Is Enterprise Data Warehouse. Dave Lush, Senior SME Aha! Analytics. Purpose(s). Communicate Some Observations About the General Data Integration Problem Cite and Discuss the Semantic Technologies
E N D
Semantic Integration Layer for The As-Is Enterprise Data Warehouse Dave Lush, Senior SME Aha! Analytics
Purpose(s) • Communicate Some Observations About the General Data Integration Problem • Cite and Discuss the Semantic Technologies • Propose a Semantic Data Integration Layer for the General Data Warehouse Architecture • Discuss a Lexis Nexus SSI Data Analytics Supercomputer (DAS) Based Solution • Present Initial Thoughts on the Plan
Topics • Purpose • Background • Givens/Problems/Tasks • Approaches to Data/Info Integration • Semantic Technologies • General Solution Architecture • LNSSI DAS Based Solution Architecture • Thoughts On the Plan
Background • Data Integration Problems • Application and Enterprise Model Based Approaches • Data Integration Problems Persist • Not Adequately Leveraging Available Metadata • Need for Improved Discovery and Semantic Integration • Emergence of Semantic Technologies • Emergence of LNSSI DAS Capability
The Primary Givens/Problem/Task • Givens: • A Collection of Disparate Legacy Databases Perhaps Already Migrated to an Enterprise Data Warehouse Each with Own Independently Developed Logical Data Model and Query Interface • The Requirement To Pose Single Unified Queries Across The Collection Of Legacy Databases And Achieve Semantically Consistent (Coherent) Results • The Problem: • Difficulties in Achieving Useful Results Because of Unresolved Semantic Disconnects in the Disparate Logical Models • Note: The Problem Is Not Primarily One of Discovery of Relevant Already Existing Product Objects But Rather One of Discovering and Semantically Integrating Requisite Product Content From Multiple Sources • The Task at Hand: • Define, Design, and Implement a Capability for the Semantic Integration and Unified Query of the Collection of Disparate Legacy Databases to Achieve Semantically Coherent Results
Basic Data/Info Integration Approaches • Application Centric Approach • Do It All in the Application Layer Via Ad Hoc Hand Coding • This Is Very Expensive And Difficult! • Enterprise Information Model and Data Warehouse Approach • Do It Via EDW ETL Methods/Tools in Context of Strict Conformance with Overarching Enterprise Info Model • This Is Also Very Expensive And Difficult And Requires Great Discipline! • Enterprise Information Integration (EII) Approach • Establish Common Single View of Disparate Legacy Sources • Process/Parse Common Domain-wide Queries into Individual Legacy Source Queries and Execute Source Queries • Integrate Source Query Results into Unified Response to the Domain-wide Query
Application Data Interface Data Interface Data Interface Legacy Databases The Basic Data Integration Challenge The application must process the unified query, formulate and submit associated queries against the disparate databases, and properly integrate the results into a unified response. This requires that the application handle disparate data interfaces. and that the application contain the necessary semantics regarding the problem domain and the relationships/mappings between problem domain and legacy data models, and the code that accomplishes the mappings. Logical models for these databases were generally developed independently of each other.
The Enterprise Data Warehouse Approach The legacy databases are migrated to a data warehouse in the context of an overarching enterprise data model so that the logical data models for the individual databases are semantically consistent with the overall model. The application still must process the unified query, formulate and submit associated queries against the disparate warehouse databases, and properly integrate the results into a unified response. But this process in theory shouldn’t have serious semantic inconsistency problem because the individual logical databases in the warehouse are supposed to have logical models which are consistent with an over arching enterprise information model. Application Data Warehouse Services Layer Meta-Data Mgt Tool Enterprise Data Warehouse target data Logical models for these databases are consistent with overarching domain model Common Enterprise Model & Meta-data Extract Transform Load (ETL) Services source data
Problems Ensue • The Imperative to Abide by a Standard Global Data Model Does Not Prevail • Stove Piped DBs Abound • Semantics of the Stove Pipes Are Inconsistent • Federated Queries Yield Semantically Inconsistent Results • Cannot Replace/Re-engineer Legacy DBs Housed in the EDW • Cannot Replace the EDW Platform (e.g.Teradata) In Use Today
New Imperative • Must Have Some Effective Way to Semantically Integrate the Information Acquired from the Multiplicity of Databases
Semantic Integration • Use Semantic Technologies in Context of the EII Approach (cited previously) • Unified Ontology of Current Situation/View Is Developed and Expressed in OWL or Appropriate Successor Language • Semantic Relationships Between Legacy Data and Rules for Transformation From Legacy to Current View Are Specified and Captured Via OWL or Appropriate Successor Language • Queries in Terms of Current Unified View Are Parsed and Transformed Into Queries of Legacy Sources by a Semantic Query Engine. • Individual Legacy Source Queries Are Executed. • Results Are Transformed and Processed Into a Unified Response by the Semantic Mash-up Engine.
Semantic Technologies • Rapidly Maturing with Very Noteworthy Applications • Enhanced Knowledge Discovery • Data/Knowledge Integration • Foundational Semantic Technology Constructs • Ontology: Machine Readable Specification of the Essence of a Given Domain • Machine Readable Knowledge/Facts • Machine Readable Rules • Standard Language(s) for Expressing the Above • XML, RDF, RDFS, OWL, RuleML • RDF Triple Store Capabilities for Storing the Above • Standard Query Languages for Searching the Above • SPARQL • Open Source Semantic Application Frameworks • Commercial Capabilities • Oracle Semantic Technologies http://www.oracle.com/technology/tech/semantic_technologies/index.html • TopQuadrant • Metatomix • Ontoprise
The General Solution Architecture • Semantic Layer Between Apps and Data • Unifying Domain Ontology • Linkage Ontology • OWL/RDF Data Management • Semantic Query Engine • Semantic Mash-up • Semantic Tech Architecture and Building Blocks • RDF(S), OWL, RuleML • Jena • Oracle Semantic Technologies • Semantic Application Development Environment (e.g. Top Quadrant)
The Semantic Approach Application The legacy databases have been migrated to the data warehouse independently each with their own logical model. The overall domain has a robust domain ontology. There are linking ontology and rules which relate & map the domain ontology to the underlying logical models. The application captures and submits the unified query to the semantic engine. The semantic engine processes the query to a standard semantic form and then applies the ontologies and rules to formulate the requisite queries against the individual databases. The individual queries are submitted and the individual responses are received by the semantic mash-up service which uses the available semantic data including the query to create an integrated semantically consistent result for the original query. Semantic Layer Semantic Query Results Mash-Up Ontology & Rule Authoring Tools Semantic Query Engine OWL/RDF DBMS Linking Ontologies & Rules Domain, Legacy, & Derived Facts Domain Ontologies & Rules Data Warehouse Services Layer Enterprise Data Warehouse
Typical EDW Architecture • Does Not Have a Data Integration Layer • This Is a Problem If Total Discipline in Conforming to Enterprise Model Is Not Exercised • And It Has Not Been Exercised • Legacy Databases Independently Migrated • Accomplishing Data Integration in the Application Layer Is Difficult and Expensive
PresentationTier Power Users Users AF Portal / AF COP(Presentation Containers = RIA, iFrame, HTML, WSRP Portlets, other) Pre-Generated Cache Business Intelligence Tools Cognos Other (Siebel, MS) BOBJ EDW Data Tier Current EDW Architecture Key Observations The as-Is architecture does not include an explicit knowledge mgt or semantics layer. This is a problem because the effectiveness of the as-is EDW depends on its ability to resolve semantic disconnects between the databases tthat must be queried. Building these capabilities into the application layer code is very difficult and costly and not responsive to highly dynamic situations. Presentation Transformation Tier Web Reports & Charts (Output = HTML, XML, PDF, XLS, DOC, other) AJAX Application Tier Data Access Tier (ODBC/JDBC) Figure 4: Layered EDW Architecture
Power Users Users Pre-Generated Cache EDW Architecture with Semantic Layer PresentationTier AF Portal / AF COP(Presentation Containers = RIA, iFrame, HTML, WSRP Portlets, other) Web Reports & Charts (Output = HTML, XML, PDF, XLS, DOC, other) Presentation Transformation Tier AJAX Business Intelligence Tools Cognos Other (Siebel, MS) BOBJ Semantic Tools Application Tier Semantic Layer Semantic Query Engine Semantic Query Results Mash-Up Ontology & Rules Authoring Tools Semantic Tier OWL/RDF DBMS Linking Ontologies & Rules Domain Ontologies & Rules Domain, Legacy, & Derived Facts Data Access Tier (ODBC/JDBC) EDW Data Tier Figure 5: EDW Architecture with Semantic Layer Key Observations The to-be architecture includes a semantics layer which mediates between the domain query and the data layer . The semantic layer includes the domain ontology and linkage ontologies which drive the processing of domain queries and the semantic mashup of individual query results coming from the source databases.
A Major Obstacle: Computational Complexity • Many Operations of the Semantic Layer Are Computationally Intensive • Complex Queries Across Multiple Large Data Sources Are Computationally Intensive • Some Kind of Specialized Solution to Execution of the Semantic Operations and Multiple Source Queries Is Required
Hypothesis • The LNSSI DAS Capability Can Be Brought to Bear on Execution of the Semantic Operations and Of Course the Source Queries As Well with Significant Benefits • So the Big Question Is: • Can the LNSSI DAS Be Applied to Large Ontologies, Rule Sets, and Large Data to Provide a Very High Performance Semantic Query Engine?
The LNSSI DAS Based Solution Architecture • Semantic Layer • Federated Domain Ontologies • Transformation Rules • Semantic Engines • LNSSI DAS Used in Two Contexts: • Semantic Ops in the Semantic Layer • Source Queries at the Data Services Level
The LNSSI DAS Based Solution Architecture Query Application The legacy databases have been migrated to the data warehouse independently each with their own logical model. The overall domain has a robust domain ontology. There are linking ontology and rules which relate & map the domain ontology to the underlying logical models. The application captures and submits the unified query to the semantic engine. The semantic engine processes the query to a standard semantic form and then applies the ontologies and rules to formulate the requisite queries against the individual databases. The individual queries are submitted and the individual responses are received by the semantic mash-up service which uses the available semantic data including the query to create an integrated semantically consistent result for the original query. Semantic Layer Services Semantic Update/Query Engine Semantic Query Results Mash-Up Ontology & Rule Authoring Tools RDMS Based OWL/RDF DBMS LNSSI DAS Based OWL/RDF Query Linking Ontologies & Rules Domain, Legacy, & Derived Facts Domain Ontologies & Rules LNSSI DAS Services Layer Enterprise Data Warehouse
What To Do? • Lets Execute a Prototype Project • To Test the Hypothesis That the LNSSI DAS Can Be Successfully Brought to Bear On Large Data Integration Problems Requiring Semantic Integration
General Approach • Find a Sponsor with Requisite $ • Form Appropriate Team and Agreements • Initiate a Prototype Project • Apply Semantic Technologies • Ontology, RFD(S), OWL • RDF Triple Store, SPARQL • Inference Engine(s) • JENA • Leverage LNSSI DAS Architecture and Capability • Create an LNSSI Semantic Integration Platform • Find a Benchmark Problem for Which There Is Already Data, Associated Semantics, and Existing Query Performance Data
Major Activities • Project Initiation • Initial Analysis, Technology Research, and Knowledge Engineering • CONOPS and System Requirements Development/Specification • Detailed Program/Project Management • Knowledge Engineering • Domain Ontology Acquisition/Development • Ontology Legacy Data Relationship, Mapping, and Rule Acquisition/Development • Acquisition/Development of the Underlying Data • Architecture Development and System Design • Detailed Apps Requirements/Design • Inclusion of RDF/OWL Data Mgt • Inclusion of Semantic Query Engine • Development of Semantic Mashup Capabilities • Implementation Planning • Implementation • Test and Eval