110 likes | 314 Views
A Metadata Management Framework for Dynamic Information Integration. 3rd VLDB Workshop on Data Management in Grids Vienna, Austria September 23rd, 2007. Jürgen Göres Heterogeneous Information Systems Group University of Kaiserslautern goeres@informatik.uni-kl.de. Outline.
E N D
A Metadata Management Framework forDynamic Information Integration 3rd VLDB Workshop on Data Management in GridsVienna, AustriaSeptember 23rd, 2007 Jürgen Göres Heterogeneous Information Systems Group University of Kaiserslautern goeres@informatik.uni-kl.de
Outline • Challenges of (Dynamic) Integration • The Key Role of Metadata • PALADIN Integration Framework • Integrated Model for Data, Metadata, and Operations • Component Contract • Framework Architecture • Using the Framework • Conclusion & Outlook
ModelManagement Operations ModelManagement Operations More Sources 103–106 Discovery Analysis Candidate Data Sources Autonomous Sources DataSources User/Application Requirements XML Schema Matching Q1 ModelManagement Operations ModelManagement Operations ModelManagement Operations RDBMS DWH Q2 Z RDBMS Q3 Mapping Interpretation IntegrationSystem XML Wrapper XSLT FDBMS RDBMS Wrapper ETL Information Integration Process Dynamic Volatile Requirements Schemas Schemas + Data Concrete Requirements(Target Schema) Higher Degree of Heterogeneity Planning Matches Integration Plan Deployment Data Management Operations
Requirements on an Integration Framework • Integrated representation for metadata and data • Extensible • Capture common aspects • Lossless • Efficient • Representation for arbitrary operationson metadata and data • Data management operations (DMOs) • Model management operations (MMOs) • Infrastructure for development of integration services • Basic Import and Export of Metadata and Data • Model Repository • Create complex integration services by combining existing services
Element Keyref Table Attribute … Column Core ForeignKey … extends extends extends … SQL XML OO Class Reference CREATE SCHEMA HumanResourcesCREATE TABLE Employee ( EmpNo Integer PRIMARY KEY, Name VARCHAR(50), ... Field Feature … Namespace Classifier … SQL XML OO … Employee EmpNo Name Department 4711 Kara Thrace Accounting 2468 Robert Doe Development 1313 Marla Singer Development 6789 Glen Larson Sales … … … Data/Metadata Representation – Concept • Paladin Metamodeling Architecture (PMA) Meta-Metamodel M3 PMM instanceof Metamodels M2 instanceof <xsd:schema> <xsd:element name="Employee"> <xsd:attribute name="EmpNo″ ... <xsd:complexType> <xsd:sequence> <xsd:element name="Name″ ... ... Models M1 instanceof <HumanResources> <Employee EmpNo=“4711”> <Name>Kara Thrace</Name> <Dept>Accounting</Dept> </Employee> <Employee EmpNo=“2468”> ... Data M0
Data/Metadata Representation – Implementation • (Meta)model-specific vs. generic • Represent elements of every metalayer with same set of generic elements (attributed, typed multigraphs) • Generate code for each metamodel and model • Implementation technology • In-memory representations • Persistent representations • One size does not fit all • Provide different physical implementations of the conceptual model • Provide conversion services between representations
Graph Transformations to Represent Operations • Representing Data Management Operators (DMOs) • Language-specific models (for SQL, XQuery etc.) • Drawbacks: • No single formalism to describe complete integration plans at an abstract level • Coupled to data models and/or specific platforms Limited deployment flexibility • Use a graph transformation formalism to specify the semantics of data management operators • Declarative • Expressive • Executable (e.g., to prototype and emulate operations) • Chain graph transformations to describe entire Abstract Integration Plans • Model Management Operators (MMOs) can be “implemented” using graph transformation, too!
paramA=42 paramB=23 InputPin1 SubComponent * TheOutput * InputPin2 * MoreOutput 1 AnotherInput 1 subCompParam=7 Components for Integration Tasks • Component Contract • Operation(s) • Input pin(s): required model type & cardinality, implementation • Output pin(s): created model type & cardinality, implementation • Parameters • Subcomponents (+parameters) ComponentOperation • Components are accessible to scripting facility • Wiring of components to complex model management scripts • Automatic conversion between different physical representations
Data Source Registry Schema Matching Framework Generic Editor for Metadata Graph Transformation Engine Integration Services & Components - Indirect Matching - Utility Measures … Lexical Matchers Instance Matchers Structural Matchers Composite Matchers Edited Models Matches Models, Style-sheets Transform. Models Schemas Matches Schemas Models Schemas Scripting Engine - Create/Load Models - Convert Models - Run components Metadata/Data Bus (Paladin Metamodel Architecture) Generic Metadata/Data representation • SQL schemas (& data) • Stylesheets • Abstract Integration Plans … • XML schemas (& data) • Scripts • Concrete Integration Plans Model Mgmt. Scripts model-specific generic Java/EMF XMI XML RDBMS Java/EMF GXL RDBMS Model Repository Persistent Storage - Source Metadata & Statistics - Target Metadata - Matches - Domain Schemas - Integration Plans - Scripts - Stylesheets - … … … PALADIN Integration Framework DDL Im/Exporter XML Im/Exporter SQLInference XQuery Inference DML Im/Exporter XQuery Im/ Exporter ETL Im/ Exporter RDBMSWrappers XMLWrappers JDBC GDS DOM GDS ETL script relationalDBMSs XMLDBMSs relationalDBMSs relationalDBMSs XMLDocs SQLDDL DTD SQLDML SQLDML XQuery XQuery … … … … Lgcy XSD ETL IDB XPath XPath Fed XML SQL RDB Data Sources Schemas Queries relationalDBMS Files XM(I/L), GXL, … Metadata/Data Sources Operations Queries/Views/Scripts/Integration Plans Native Metadata/Data representations W W W W PALADIN Integration Framework – Architecture
Matches 1 WComposite weights=[0.3,0.7] 1. wStruct= 0.6 thHigh= 0.8 thLow= 0.3 MatchesA 1 One2One Matches 1. Tokenizer 1. 1 MatchesB TreeMatch 1 SchemaA 1 Matches 2. * * * lsim SchemaB 1 wsim 1 1 2. 2. ssimInit 3. 1 ssim 1. 1 Stemmer S 1 T 1 * * 2. Plan Edit-Distance Wordnet Datatype 1 Source MyMatch MatchEdit 1 1 SchemaA 1 Matches 1 SchemaB Target 1 1 Using the Framework – Schema Matching Integration Planning CreatePlan MyMatch SchemaA 1 SchemaB 1 GEM 1. PIPE Models * 2. Sources Models * 3. * 2. Plan Stylesheet Matches 1 1 * Target 1
Conclusion & Outlook • Key role of metadata management for information integration • Provide a framework to allow “integration of integration approaches” • Integrated, extensible representation for data and metadata • Abstract representation of data and model management operations • Infrastructure for development of integration components • Flexible wiring of components to provide integration services • Goal: Provide a complete tool chain for integration in dynamic environments • Analysis • Discovery • Planning (matching & mapping interpretation) • Deployment