1 / 11

A Metadata Management Framework for Dynamic Information Integration

A Metadata Management Framework for Dynamic Information Integration. 3rd VLDB Workshop on Data Management in Grids Vienna, Austria September 23rd, 2007. Jürgen Göres Heterogeneous Information Systems Group University of Kaiserslautern goeres@informatik.uni-kl.de. Outline.

vega
Download Presentation

A Metadata Management Framework for Dynamic Information Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Metadata Management Framework forDynamic Information Integration 3rd VLDB Workshop on Data Management in GridsVienna, AustriaSeptember 23rd, 2007 Jürgen Göres Heterogeneous Information Systems Group University of Kaiserslautern goeres@informatik.uni-kl.de

  2. Outline • Challenges of (Dynamic) Integration • The Key Role of Metadata • PALADIN Integration Framework • Integrated Model for Data, Metadata, and Operations • Component Contract • Framework Architecture • Using the Framework • Conclusion & Outlook

  3. ModelManagement Operations ModelManagement Operations More Sources 103–106 Discovery Analysis Candidate Data Sources Autonomous Sources DataSources User/Application Requirements XML Schema Matching Q1 ModelManagement Operations ModelManagement Operations ModelManagement Operations RDBMS DWH Q2 Z RDBMS Q3 Mapping Interpretation IntegrationSystem XML Wrapper XSLT FDBMS RDBMS Wrapper ETL Information Integration Process Dynamic Volatile Requirements Schemas Schemas + Data Concrete Requirements(Target Schema) Higher Degree of Heterogeneity Planning Matches Integration Plan Deployment Data Management Operations

  4. Requirements on an Integration Framework • Integrated representation for metadata and data • Extensible • Capture common aspects • Lossless • Efficient • Representation for arbitrary operationson metadata and data • Data management operations (DMOs) • Model management operations (MMOs) • Infrastructure for development of integration services • Basic Import and Export of Metadata and Data • Model Repository • Create complex integration services by combining existing services

  5. Element Keyref Table Attribute … Column Core ForeignKey … extends extends extends … SQL XML OO Class Reference CREATE SCHEMA HumanResourcesCREATE TABLE Employee ( EmpNo Integer PRIMARY KEY, Name VARCHAR(50), ... Field Feature … Namespace Classifier … SQL XML OO … Employee EmpNo Name Department 4711 Kara Thrace Accounting 2468 Robert Doe Development 1313 Marla Singer Development 6789 Glen Larson Sales … … … Data/Metadata Representation – Concept • Paladin Metamodeling Architecture (PMA) Meta-Metamodel M3 PMM instanceof Metamodels M2 instanceof <xsd:schema> <xsd:element name="Employee"> <xsd:attribute name="EmpNo″ ... <xsd:complexType> <xsd:sequence> <xsd:element name="Name″ ... ... Models M1 instanceof <HumanResources> <Employee EmpNo=“4711”> <Name>Kara Thrace</Name> <Dept>Accounting</Dept> </Employee> <Employee EmpNo=“2468”> ... Data M0

  6. Data/Metadata Representation – Implementation • (Meta)model-specific vs. generic • Represent elements of every metalayer with same set of generic elements (attributed, typed multigraphs) • Generate code for each metamodel and model • Implementation technology • In-memory representations • Persistent representations • One size does not fit all • Provide different physical implementations of the conceptual model • Provide conversion services between representations

  7. Graph Transformations to Represent Operations • Representing Data Management Operators (DMOs) • Language-specific models (for SQL, XQuery etc.) • Drawbacks: • No single formalism to describe complete integration plans at an abstract level • Coupled to data models and/or specific platforms Limited deployment flexibility • Use a graph transformation formalism to specify the semantics of data management operators • Declarative • Expressive • Executable (e.g., to prototype and emulate operations) • Chain graph transformations to describe entire Abstract Integration Plans • Model Management Operators (MMOs) can be “implemented” using graph transformation, too!

  8. paramA=42 paramB=23 InputPin1 SubComponent * TheOutput * InputPin2 * MoreOutput 1 AnotherInput 1 subCompParam=7 Components for Integration Tasks • Component Contract • Operation(s) • Input pin(s): required model type & cardinality, implementation • Output pin(s): created model type & cardinality, implementation • Parameters • Subcomponents (+parameters) ComponentOperation • Components are accessible to scripting facility • Wiring of components to complex model management scripts • Automatic conversion between different physical representations

  9. Data Source Registry Schema Matching Framework Generic Editor for Metadata Graph Transformation Engine Integration Services & Components - Indirect Matching - Utility Measures … Lexical Matchers Instance Matchers Structural Matchers Composite Matchers Edited Models Matches Models, Style-sheets Transform. Models Schemas Matches Schemas Models Schemas Scripting Engine - Create/Load Models - Convert Models - Run components Metadata/Data Bus (Paladin Metamodel Architecture) Generic Metadata/Data representation • SQL schemas (& data) • Stylesheets • Abstract Integration Plans … • XML schemas (& data) • Scripts • Concrete Integration Plans Model Mgmt. Scripts model-specific generic Java/EMF XMI XML RDBMS Java/EMF GXL RDBMS Model Repository Persistent Storage - Source Metadata & Statistics - Target Metadata - Matches - Domain Schemas - Integration Plans - Scripts - Stylesheets - … … … PALADIN Integration Framework DDL Im/Exporter XML Im/Exporter SQLInference XQuery Inference DML Im/Exporter XQuery Im/ Exporter ETL Im/ Exporter RDBMSWrappers XMLWrappers JDBC GDS DOM GDS ETL script relationalDBMSs XMLDBMSs relationalDBMSs relationalDBMSs XMLDocs SQLDDL DTD SQLDML SQLDML XQuery XQuery … … … … Lgcy XSD ETL IDB XPath XPath Fed XML SQL RDB Data Sources Schemas Queries relationalDBMS Files XM(I/L), GXL, … Metadata/Data Sources Operations Queries/Views/Scripts/Integration Plans Native Metadata/Data representations W W W W PALADIN Integration Framework – Architecture

  10. Matches 1 WComposite weights=[0.3,0.7] 1. wStruct= 0.6 thHigh= 0.8 thLow= 0.3 MatchesA 1 One2One Matches 1. Tokenizer 1. 1 MatchesB TreeMatch 1 SchemaA 1 Matches 2. * * * lsim SchemaB 1 wsim 1 1 2. 2. ssimInit 3. 1 ssim 1. 1 Stemmer S 1 T 1 * * 2. Plan Edit-Distance Wordnet Datatype 1 Source MyMatch MatchEdit 1 1 SchemaA 1 Matches 1 SchemaB Target 1 1 Using the Framework – Schema Matching Integration Planning CreatePlan MyMatch SchemaA 1 SchemaB 1 GEM 1. PIPE Models * 2. Sources Models * 3. * 2. Plan Stylesheet Matches 1 1 * Target 1

  11. Conclusion & Outlook • Key role of metadata management for information integration • Provide a framework to allow “integration of integration approaches” • Integrated, extensible representation for data and metadata • Abstract representation of data and model management operations • Infrastructure for development of integration components • Flexible wiring of components to provide integration services • Goal: Provide a complete tool chain for integration in dynamic environments • Analysis • Discovery • Planning (matching & mapping interpretation) • Deployment

More Related