1 / 42

Core Schema Mappings

Core Schema Mappings. Gianni Mecca* Paolo Papotti ° Salvatore Raunich* * Università della Basilicata, Italy °Università Roma Tre, Italy Sigmod - Providence - 2009, July 2nd. I just want to move data. Source. Target. Given a minimal abstract specification. IBDBook [0..*].

dalton
Download Presentation

Core Schema Mappings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Core Schema Mappings Gianni Mecca*Paolo Papotti° Salvatore Raunich* * Università della Basilicata, Italy °Università Roma Tre, Italy Sigmod - Providence - 2009, July 2nd

  2. I just wanttomove data Source Target Given a minimal abstractspecification IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher IBLBook[0..*] name title pubId IBLPublisher[0..*] id name Core Schema Mappings

  3. I just wanttomove data Source Target Given a minimal abstractspecification IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] id name Core Schema Mappings

  4. I just wanttomove data Source Target Given a minimal abstractspecification With source semanticspreserved in the target instance IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] Target: Book Target: Publisher id name IBDBook LOC [Clio Vldb02,Vldb06,Vldb08] [HepTox Vldb05] [ADO.net Sigmod07] [MapForce09, StylusStudio09] IBLPublisher IBLBook Core Schema Mappings

  5. Mapping generation Source Target IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] id name • A first generation schema mappingtoolgenerates • m1 ∀ t1: IBDBook(t1) → ∃N: Book(t1, N ) Core Schema Mappings

  6. Mapping generation Source Target IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] id name • A first generation schema mappingtoolgenerates • m1∀ t1: IBDBook(t1) → ∃N: Book(t1, N) • m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p) Core Schema Mappings

  7. Mapping generation Source Target IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] id name • A first generation schema mappingtoolgenerates • m1 ∀ t1: IBDBook(t1) → ∃N: Book(t1, N) • m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i) → Book(t3, i) • m4 IBLPublisher(i’, p’) → Publisher (i’, p’) Core Schema Mappings

  8. I just wanttomove data Source Target Given a minimal abstractspecification With source semanticspreserved in the target instance IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] Target: Book Target: Publisher id name IBDBook LOC IBLPublisher IBLBook Core Schema Mappings

  9. I just wanttomove data Source Target Given a minimal abstractspecification With source semanticspreserved in the target instance With no redundancy in the target instance IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId canonical IBLPublisher[0..*] Target: Book Target: Publisher id name Post processing step [TODS05, JACM08] core

  10. I just wanttomove data Source Target Given a minimal abstractspecification With source semanticspreserved in the target instance With no redundancy in the target instance IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] Target: Book Target: Publisher id name null→ 245 Post processing step [TODS05, JACM08]

  11. Source Target IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] id Theoretically, the problemissolved name

  12. Source Target IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] id • Theoretically, the problemissolved • In practice, for a simple schema mappingwith 5000 source tuples • canonicalsolution = 1 sec • corecomputation= 8 hours name

  13. What’s the problem? • Semantics of data exchange is defined using the chase [ICDT03]: chase(I,M)= canonical univ sol (I,M) • chase = SQL scripts • speed • flexibility and reuse • Post processing looksrecursivelyforhomomorphismsbetweeninstances • custom engine Canonicalsolutioncomputation Core Schema Mappings

  14. What’s the solution? Coresolutioncomputation Core Schema Mappings

  15. What’s the solution? • Core schema mappings • Findhomomorphismsbetweenformulas • Correlate mappingstoavoidhomomorphismbetweenfacts • negation • sophisticated skolemizationfornulls • Corecomputationasresultof the (standard) chase • chase(I,M)= core(I,M) • Enables a scalablesolution Coresolutioncomputation Core Schema Mappings

  16. Outline • Howdifficultismymapping? • Subsumptions • Coverages • Self-joins • Howdoesit scale? • Complexity • SQL experiments • Modularity Core Schema Mappings

  17. Subsumptions IBDBook Source Target Target: Book IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name IBLBook title pubId N1 to 245? • m1IBDBook(t1) → Book(t1, N) • m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i) → Book(t3, i) Core Schema Mappings

  18. Subsumptions IBDBook Source Target IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name IBLBook title pubId • m1IBDBook(t1) → Book(t1, N) • m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i) → Book(t3, i) Core Schema Mappings

  19. Subsumptions IBDBook Source Target Target: Book IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name IBLBook title pubId • m1IBDBook(t1) → Book(t1, N) • m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i) → Book(t3, i) Core Schema Mappings

  20. Subsumptions IBDBook Source Target Target: Book IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name IBLBook title pubId N1 to 245? • m1IBDBook(t1) → Book(t1, N) • m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i) → Book(t3, i) Core Schema Mappings

  21. Subsumptions Source Target m3subsumes m1 generate target tuples for m3, the “more informative” mapping for m1 generate only those tuples that add new content to the target IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId • Ouralgorithmidentifieshomomorphisms in the target side of the tgds • m1IBDBook(t1) → Book(t1, N) • m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i) → Book(t3, i) homomorphism: t1 → t3, N → i Core Schema Mappings

  22. Subsumptions Source Target m3subsumes m1 generate target tuples for m3, the “more informative” mapping for m1 generate only those tuples that add new content to the target IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId m’1 : IBDBook(t1) ∧ ¬(IBLBook(t1, i)) → Book(t1, N) • m1IBDBook(t1) →Book(t1, N) • m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i)→ Book(t3, i) Core Schema Mappings

  23. Subsumptions also m2 subsumes m1 Source Target IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId m’1 : IBDBook(t1) ∧ ¬(IBLBook(t1, i)) ∧ ¬(LOC(t1, p)) → Book(t1, N) • m1IBDBook(t1) → Book(t1, N) • m2LOC(t2, p) → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i) → Book(t3, i) homomorphism: t1 → t2, N → N’ Core Schema Mappings

  24. Coverages Source Target • m’1IBDBook(t1) ∧ ¬(IBLBook(t1, i)) ∧ • ¬ (LOC(t1, p)) → Book(t1, N) • m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i) → Book(t3, i) • m4 IBLPublisher(i’, p’) → Publisher (i’, p’) IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] id name IBDBook LOC IBLPublisher IBLBook Core Schema Mappings

  25. Coverages Source Target • m’1IBDBook(t1) ∧ ¬(IBLBook(t1, i)) ∧ • ¬ (LOC(t1, p)) → Book(t1, N) • m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i) → Book(t3, i) • m4 IBLPublisher(i’, p’) → Publisher (i’, p’) IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] id name IBDBook LOC Target: Book Target: Publisher IBLPublisher IBLBook Core Schema Mappings

  26. Coverages Source Target • m’1IBDBook(t1) ∧ ¬(IBLBook(t1, i)) ∧ • ¬ (LOC(t1, p)) → Book(t1, N) • m2 LOC(t2, p) → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i) → Book(t3, i) • m4 IBLPublisher(i’, p’) → Publisher (i’, p’) IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] id name IBDBook LOC Target: Book Target: Publisher IBLPublisher IBLBook Core Schema Mappings

  27. Coverages Source Target • m’1IBDBook(t1) ∧ ¬(IBLBook(t1, i)) ∧ • ¬ (LOC(t1, p)) → Book(t1, N) • m’2 LOC(t2, p) ∧ • ¬(IBLBook (t2, i) ∧ IBLPublisher(i, p)) • → Book(t2, N’) Publisher (N’, p) • m3 IBLBook (t3, i) → Book(t3, i) • m4 IBLPublisher(i’, p’) → Publisher (i’, p’) IBDBook[0..*] Book [0..*] title title pubId LOC [0..*] title Publisher [0..*] publisher id IBLBook[0..*] name title pubId IBLPublisher[0..*] id name more differences, new joins IBDBook LOC Target: Book Target: Publisher IBLPublisher IBLBook Core Schema Mappings

  28. Self-joins m1 R(a, b, c, d) → S (x5, b, x1, x2, a) ∧ S (x5, c, x3, x4, a) ∧ S (d, c, x3, x4, b) m2 R(a, b, c, d) → S (d, a, a, x1, b) ∧ S (x, a, a, x, a) ∧ S (x, c, x, x, x) Examplefrom [TODS05] techniques discussed so far are of little help: the chase will either generate three tuples or none duplicate symbols: many different ways of satisfying these constraints Core Schema Mappings

  29. Self-joins: two-phase first exchange secondexchange Expansions S1 R S S2 m’1 R(a, b, c, d) → S1 (x5 , b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4) S1 (x5 , b, x1, x2, a) … → S (…) S2 (x5, c, x3, x4, a) … → S (…) S (…) S3 (d, c, x3, x4, b) … → S (…) S4 (h, e, e, y1, f ) … → S (…) S5 (y5, e, e, y1, e) … → S (…) S6 (y5, g, y2, y3, y4) … → S (…) … • we rewrite the tgds using distinct symbols and variables and do a first exchange • second exchange considers all possible combinations and copy the data in the original relation avoiding redundancy Core Schema Mappings

  30. Self-joins m’1 R(a, b, c, d) → S1 (x5, b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4) R(n, n, n, k) Core Schema Mappings

  31. Self-joins: expansions m’1 R(a, b, c, d) → S1 (x5, b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4) R(n, n, n, k) Core Schema Mappings

  32. Self-joins: expansions m’1 R(a, b, c, d) → S1 (x5, b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4) R(n, n, n, k) Core Schema Mappings

  33. Self-joins: twophase first exchange secondexchange Expansions S1 R S S2 m1 R(a, b, c, d) → S1 (x5 , b, x1, x2, a) ∧ S2 (x5, c, x3, x4, a) ∧ S3 (d, c, x3, x4, b) m′2 R(e, f, g, h) → S4 (h, e, e, y1, f ) ∧ S5 (y5, e, e, y1, e) ∧ S6 (y5, g, y2, y3, y4) e12 S2(x5, c, x3, x4, a) ∧ S3(d, c, x3, x4, b) ∧ (S1(x5, b, x1, x2, a) ∧ b = c) → S (…) S (…) e13 … → S (…) … • useexpansionsaspremisesof the secondexchange • rewritethesenewtgdsusingsubsumptionstoavoidredundancy • favor more compact and more informative Core Schema Mappings

  34. Outline • Howdifficultismymapping? • Subsumptions • Coverages • Self-joins • Howdoesit scale? • Complexity • SQL experiments • Modularity Core Schema Mappings

  35. Complexity Realcases O(n) In realisticcases are lower n: numberoftgds d: maximumnumberofdifferenttgdsthatwriteinto a table k: maximumnumberofatoms in a tgdconclusion expression complexity (not data complexity!) Core Schema Mappings

  36. Experimentssettings • Algorithmsimplemented in +Spicy • Scripts in SQL (and XQuery) • PostgreSQL 8.3 on a Intel CoreDuo 2.4Ghz/4GB Ram/Linux • Scenariosfrom the literature, mostlyfromSTBenchmark[Vldb08 – www.stbenchmark.org] • Each SQL test run with 10k, 100k, 250k, 500k, and 1M tuples in the source instance • Time limit = 1 hour • custom engine exceeded the time limit in all scenarios Core Schema Mappings

  37. Experimentsresults Subsumption and coverages Selfjoins Times (sec) Times (sec) #tuples in the source #tuples in the source Core Schema Mappings

  38. Experimentsresults Subsumption and coverages Selfjoins Times (sec) Times (sec) #tuples in the source #tuples in the source Times (sec) Scalabilityexperimentswith up to 100 tables (82 tgds, 51 subsuptions, 12 coverages): rewritingalgorithmran in 6 secs Core Schema Mappings

  39. Modular solution canonicalsolution subsumption-freesolution coverage-freesolution core (self-join-coverage-freesolution) Algorithms allow to produce approximations of the core for expensive computations: “reduced” rewriting wrt to a subset of homomorphisms Core Schema Mappings

  40. Repeatability & Workability Evaluation We participated in the ACM SIGMOD 2009 Repeatability & Workability Evaluation (cf., http://homepages.cwi.nl/~manegold/SIGMOD-2009-RWE/) The reviewers were able to repeat all the experiments presented in our paper, yielding results that in most cases match the ones published in our paper, except from insignificant and to be expected variation due to randomness and/or hardware/software differences. In addition, workability experiments confirmed - with few exceptions - the soundness of our results beyond the parameter setting and/or data sets presented in our paper. The detailed reports will shortly be made publicly available by ACM SIGMOD. Core Schema Mappings

  41. Bridging the gap • +Spicy is the first mapping tool which generates core solutions efficiently • Standard solution for mapping generation • Novel algorithms for (natural) mapping rewriting • Execution times orders of magnitude faster than post processing • Twomainresults • Bridge the gap between the practiceofmapping generation [Popaet al. Vldb02] and the theoryof data exchange [Faginet al. TODS05, Gottlob&Nash JACM08] • Enable schema mappingsfor more practicalapplications Core Schema Mappings

  42. Thankyou seeyou in Lyonfor the demo: Mecca, Papotti, Raunich, Buoncristiano “Concise and Expressive Mappings with +Spicy” VLDB 2009 Core Schema Mappings

More Related