1 / 16

A Principled Approach to Data Integration and Reconciliation in Data Warehousing

A Principled Approach to Data Integration and Reconciliation in Data Warehousing. Diego Calvanese Giuseppe De Giacomo Maurizio Lenzerini Daniele Nardi Riccardo Rosati Presented by Alan Wessman. Introduction. Problem: Acquire data from a set of sources for a particular application

elvis-yates
Download Presentation

A Principled Approach to Data Integration and Reconciliation in Data Warehousing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Principled Approach to Data Integration and Reconciliation in Data Warehousing Diego Calvanese Giuseppe De Giacomo Maurizio Lenzerini Daniele Nardi Riccardo Rosati Presented by Alan Wessman

  2. Introduction • Problem: Acquire data from a set of sources for a particular application • Typical architecture: wrappers and mediators • Core problem: specify and implement mediators • Paper focus: Data warehouses

  3. Data Warehouse Integration • Most sources internal to organization • Need global corporate view of data • Conceptual model defines sources and data warehouse (local-as-view) • Three levels of architecture • Conceptual: Global model • Logical: Query specifications for sources and warehouse • Physical: Wrappers and mediators implementing query specifications

  4. Conceptual Model q3, q4, q5 q6, q7 q1, q2 Source 1 Source 2 Data Warehouse Architecture

  5. Specifying Logical Schemas • For each table of source S, create an adorned query • Head: Table name, # columns • Body: Content of table (query over conceptual model) • Adornment: • Domains (data types) of columns • Key attributes

  6. Conceptual Model Source 1 Source 2 Lira Yen Euro Adorned Query: Example Halibut(Date, Price) <- Menu(Date, ‘Halibut’, Price) | Price :: Lira, Date :: JulianDate Swordfish(Date, Price) <- Menu(Date, ‘Swordfish’, Price) | Price :: Lira, Date :: JulianDate SushiMenu(TunaPrice, SquidPrice, Date) <- Menu(Date, ‘Tuna’, TunaPrice), Menu(Date, ‘Squid’, SquidPrice) | TunaPrice :: Yen, SquidPrice :: Yen, Date :: JulianDate

  7. Query Consistency Let Q be an adorned query and B its body. Let M be the conceptual model. • B is inconsistent wrt M if for every interpretation of M, evaluation of B is empty • Q is inconsistent wrt M if either B is inconsistent or the annotations are inconsistent • Inference techniques exist for checking query consistency

  8. Interschema Correspondences • Specify how data in different schemas relates • Non-materialized relational tables (computed on-demand) • Like adorned query but annotations identify helper programs • Reusable by other correspondences

  9. Interschema Correspondences Three types of correspondence • Conversion • How data from one source is converted into data fitting a different schema • Matching • How data from different sources matches • Reconciliation • How data from different sources is reconciled to become data in the warehouse

  10. Conversion Correspondence How data from one source is converted into data fitting a different schema convert([x], [y]) <- conj(x, y, z) through program(x, y, z) • conj: Conjunctive query, specifies when conversion applies • program: Program that performs the conversion • x: Input tuple of values satisfying conditions for x in conj • y: Output tuple of values satisfying conditions for y in conj • z: Additional parameters required by program

  11. Matching Correspondence How data from different sources matches match([x1], …, [xk]) <- conj(x1, …, xk, z) through program(x1, …, xk, z) Differs from Conversion Correspondence in use of k tuples that may be matched program returns true if the k tuples match

  12. Reconciliation Correspondence How data from different sources is reconciled to the warehouse reconcile([x1], …, [xk], [z]) <- conj(x1, …, xk, z, w) through program(x1, …, xk, z, w) z: Data warehouse tuple; result of reconciliation. w: Additional parameters (like z in previous slides)

  13. Reusing Correspondences • Only reuse if previously defined • Example 1 match([x], [y]) <- convert1([x], [z]), convert2([y], [z]), conj(x, y, z, w) through none • Example 2 reconcile([x], [y], [z]) <- convert1([x], [w1]), convert2([y], [w2]), match1([w1], [w2]), convert3([w1], [z]), conj(x, y, z, w) through none

  14. Specifying Mediators Aim: Specify for each relation in warehouse how the tuples should be constructed from the sources Task: Materialize a new relation T in the warehouse Steps: • Specify T as an adorned query q <- q’ | c1, …, cn • Look for a rewriting of q in terms of queries q1, …, qs corresponding to materialized views in the warehouse • Look for a rewriting of (what remains of q) in terms of queries corresponding to tables in the sources and the conversion, matching, and reconciliation correspondences Resulting query is specification for the mediator for T

  15. Computing the Rewriting • Rewriting typically needs to merge results of several queries • Produce set of merging clausesForm:merging tuple-spec1 and … and tuple-specnsuch that matching-conditioninto tuple-spect1 and … and tuple-spectm • Generates template; designer specifies “such that” and “into” parts, or writes custom merging clauses

  16. Conclusion • Start with conceptual model and several types of correspondences • Query rewriting algorithm generates mediator specifications • Designer fills in any remaining details • No empirical results

More Related