1 / 38

Debugging Schema Mappings with Routes

Debugging Schema Mappings with Routes. Laura Chiticariu UC Santa Cruz (joint work with Wang-Chiew Tan). SPIDER : A S chema Map pi ng De bugge r. Demo group B. Today 14:00-15:30 Thursday 11:00-12:30. I. Source instance. Schema Mappings.

juliet
Download Presentation

Debugging Schema Mappings with Routes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Debugging Schema Mappings with Routes Laura Chiticariu UC Santa Cruz (joint work with Wang-Chiew Tan)

  2. SPIDER: A Schema Mapping Debugger Demo group B Today 14:00-15:30 Thursday 11:00-12:30

  3. I Source instance Schema Mappings • A schema mapping is a logical assertion that describes the correspondence between two schemas • Key element in data exchange and data integration systems • Data Exchange [FKMP05] • Translate data conforming to a source schema S into data conforming to a target schema T so that the schema mapping M is satisfied M Schema S Schema T J Target instance

  4. I Source instance Debugging a Data Exchange Today M • Debugging at the (low) level of the implementation • Specific to the data exchange engine • Specific to the implementation language: XQuery, SQL, etc • Debugging at the level of schema mappings NO SUPPORT!!! Schema S Schema T XQuery/XSLT/Java J Target instance

  5. I Source instance Debugging Schema Mappings M • Debugging schema mappings: the process of exploring, understanding and refining a schema mapping through the use of (test) data at the level of schema mappings Schema S Schema T J Target instance

  6. Outline • Overview • Motivation • Debugging schema mappings with routes • Motivating example • What are routes? • Computing routes • Related work • Performance evaluation • Conclusions

  7. Motivation • Schema mappings are good • Higher-level, declarative programming constructs • Hide implementation details, allow for optimization • Typically easier to understand vs. SQL/XSLT/XQuery/Java • Serve a similar goal as model management [Bernstein03, MBHR05] • Uniformity in specifying and debugging • Reduce programming effort by allowing a user to specify and debug at the level of schema mappings • Schema mappings are often generated by schema matching tools • Close to user’s intention, but may need further refinements • Hard to understand without the help of tools

  8. Language for Schema Mappings • Tuple generating dependencies (tgds) • 8x ((x) !9y(x,y)) • Equality generating dependencies (egds) • 8x ((x) ! x1 = x2) • Remarks: • Widely used for relational schema mappings in data exchange and data integration [Kolaitis05,Lenzerini02] • TGDs generalize LAV, GAV and are equivalent to GLAV assertions in the terminology of data integration • Extended to handle XML data exchange [PVMHF02]

  9. I Source instance Relational Schema Mappings [FKMP03] • Schema mapping M = (S, T, st[t) • S, T: relational schemas with no relation symbols in common • Source-to-target dependencies st: • Source-to-target tgds (s-t tgds) S(x)!9y T(x,y) • Target dependencies t: • Target tgds: T(x)!9y T(x,y) • Target egds:  T(x)!x1 = x2 ∑st ∑t Schema S Schema T J Target instance

  10. Example Schema Mapping S: T: MANHATTAN CREDIT CardHolders: cardNo ² limit ² ssn ² name ² Dependents: accNo ² ssn ² name ² Source-to-target dependencies, st: m1: CardHolders(cn,l,s,n) ! 9L (Accounts(cn,L,s) Æ Clients(s,n)) m2: Dependents(an,s,n) ! Clients(s,n) Target dependencies,t: m3: Clients(s,n) !9A 9L (Accounts(A,L,s)) FARGO FINANCE Accounts: ² accNo ² creditLine ² accHolder Clients: ² ssn ² name m1 fk1 m3 m2 Solution for I under the schema mapping Target instance J Source instance I CardHolders Accounts Clients Dependents

  11. Example Debugging Scenario 1 Target instance J Source instance I CardHolders Accounts Clients Dependents Unknown credit limit? A route for the Accounts tuple Accounts CardHolders 123 L1 ID1 m1 123 $15K ID1 Alice Clients ID1 Alice 15K is not copied over to the target m1: CardHolders(cn,l,s,n) ! 9L (Accounts(cn,L,s) ^ Clients(s,n))

  12. Example Debugging Scenario 1 Target instance J Source instance I CardHolders Accounts Clients Dependents Unknown credit limit? A route for the Accounts tuple Accounts CardHolders 123 L1 ID1 m1 123 $15K ID1 Alice Clients ID1 Alice 15K is not copied over to the target m1: CardHolders(cn,l,s,n) ! (Accounts(cn,l,s) ^ Clients(s,n))

  13. Route for Accounts tuple with accNo A2 Dependents Accounts Clients m2 m3 123 ID2 Bob ID2 Bob A2 L2 ID2 Example Debugging Scenario 2 Target instance J Source instance I CardHolders Accounts Clients Dependents Unknown account number? 123 is not copied over to the target as Bob’s account number m2: Dependents(an,s,n) ! Clients(s,n)

  14. Route for Accounts tuple with accNo A2 Dependents Accounts Clients m2 m3 123 ID2 Bob ID2 Bob A2 L2 ID2 Example Debugging Scenario 2 Target instance J Source instance I CardHolders Accounts Clients Dependents Unknown account number? 123 is not copied over to the target as Bob’s account number m’2: CardHolders(an,l,s’,n’)^ Dependents(an,s,n) ! Accounts(an,l,s)^ Clients(s,n)

  15. Debugging Schema Mappings with Routes • Main intuition: routes describe the relationships between source and target data with the schema mapping • Definition: Let: • M be a schema mapping • I be a source instance • J be a solution for I under M and Jsµ J A route for Js with M and (I,J) is a finite non-empty sequence of satisfaction steps (I,;) ! (I,J1) ! … ! (I,Jn) such that: • Jiµ J, mi2st [ t, where 1· i· n • Jsµ Jn mn, hn m1, h1 m2, h2

  16. Example of Satisfaction Step Target instance J Source instance I CardHolders Accounts Clients Dependents Unknown credit limit? Accounts CardHolders m1, h1 Clients m1: CardHolders(cn, l, s, n) !9L (Accounts(cn, L, s ) ^ Clients(s, n )) h1={cn ! ‘123’, l ! $15K, s ! ID1, n ! Alice, L ! L1}

  17. Compute all routes • The schema mapping M is fixed • Input: source instance I, a solution J for I under M, a set of target tuples Jsµ J • Output: a forest representing all routes for Js • Algorithm idea: • For each tuple t in Js, consider every possible 2st[t and h for witnessing t • Do the same for all target tuples encountered during the process until tuples from the source instance are obtained

  18. 6, x  a T4(a) T6(a) Compute all routes: A simple example T7(a) • st: • 1: S1(x) ! T1(x) • 2: S2(x) ! T2(x) Æ T6(x) • t: • 3: T2(x) ! T3(x) • 4: T3(x) ! T4(x) • 5: T4(x) Æ T1(x) ! T5(x) • 6: T4(x) Æ T6(x) ! T7(x) • 7: T5(x) ! T3(x) • Source instance, I: • S1(a), S2(a) • A solution, J: • T1(a), …, T7(a)

  19. 4, x  a T3(a) Compute all routes: A simple example T7(a) • st: • 1: S1(x) ! T1(x) • 2: S2(x) ! T2(x) Æ T6(x) • t: • 3: T2(x) ! T3(x) • 4: T3(x) ! T4(x) • 5: T4(x) Æ T1(x) ! T5(x) • 6: T4(x) Æ T6(x) ! T7(x) • 7: T5(x) ! T3(x) • Source instance, I: • S1(a), S2(a) • A solution, J: • T1(a), …, T7(a) 6 T4(a) T6(a)

  20. 7 T5(a) Compute all routes: A simple example T7(a) • st: • 1: S1(x) ! T1(x) • 2: S2(x) ! T2(x) Æ T6(x) • t: • 3: T2(x) ! T3(x) • 4: T3(x) ! T4(x) • 5: T4(x) Æ T1(x) ! T5(x) • 6: T4(x) Æ T6(x) ! T7(x) • 7: T5(x) ! T3(x) • Source instance, I: • S1(a), S2(a) • A solution, J: • T1(a), …, T7(a) 6 T4(a) T6(a) 4 T3(a)

  21. 5 T4(a) T1(a) 1 S1(a) Compute all routes: A simple example T7(a) • st: • 1: S1(x) ! T1(x) • 2: S2(x) ! T2(x) Æ T6(x) • t: • 3: T2(x) ! T3(x) • 4: T3(x) ! T4(x) • 5: T4(x) Æ T1(x) ! T5(x) • 6: T4(x) Æ T6(x) ! T7(x) • 7: T5(x) ! T3(x) • Source instance, I: • S1(a), S2(a) • A solution, J: • T1(a), …, T7(a) 6 T4(a) T6(a) 4 T3(a) 7 T5(a)

  22. 2 S2(a) 3 T2(a) 5 2 T4(a) T1(a) S2(a) 1 S1(a) Compute all routes: A simple example T7(a) • st: • 1: S1(x) ! T1(x) • 2: S2(x) ! T2(x) Æ T6(x) • t: • 3: T2(x) ! T3(x) • 4: T3(x) ! T4(x) • 5: T4(x) Æ T1(x) ! T5(x) • 6: T4(x) Æ T6(x) ! T7(x) • 7: T5(x) ! T3(x) • Source instance, I: • S1(a), S2(a) • A solution, J: • T1(a), …, T7(a) 6 T4(a) T6(a) 4 T3(a) 7 T5(a)

  23. 5 T4(a) T1(a) 1 S1(a) Compute all routes: A simple example T7(a) • st: • 1: S1(x) ! T1(x) • 2: S2(x) ! T2(x) Æ T6(x) • t: • 3: T2(x) ! T3(x) • 4: T3(x) ! T4(x) • 5: T4(x) Æ T1(x) ! T5(x) • 6: T4(x) Æ T6(x) ! T7(x) • 7: T5(x) ! T3(x) • Source instance, I: • S1(a), S2(a) • A solution, J: • T1(a), …, T7(a) 6 T4(a) T6(a) 4 8 T3(a) S2(a) 7 3 T5(a) T2(a) 2 S2(a) Route for T7(a): 2, 3, 4, 8, 6

  24. Properties of compute all routes • Completeness: Let F denote the route forest by our algorithm returned on Js. If R is a minimal route for Js, then it is represented in F. • Running time: polynomial in the sizes of I, J and Js • Every “branch of a tuple” once explored, is never explored again • Polynomial number of branches for each tuple since M is fixed • Challenge: • Exponentially many routes, but polynomial-size representation constructed in polynomial time

  25. Compute one route • Our experimental results indicate that compute all routes can be expensive • Generate one route fast and alternative routes as needed? • Our solution: adapt compute all routes to compute only one route • Non-exhaustive: Stops when one witness is found. A witness that uses source tuples is preferred • Inference procedure: to deduce all consequences of a proven tuple and avoid recomputation of “branches” • Key step for polynomial time analysis • Completeness: If there is a route for Js, then our algorithm will produce a route for Js

  26. Related work • Commercial data exchange systems • e.g., Altova MapForce, Stylus Studio • Use “lower-level” languages (e.g., XSLT, XQuery) to specify the exchange • Debugging is done at this low level • Source tuple centric • Data viewer [YMHF01] • Constructs an “example” source instance illustrative for the behavior of the schema mapping • Complementary to our approach • Works only for relational schema mappings

  27. Related work • Computing routes for target data is related to computing provenance (aka lineage) of data

  28. Empirical Evaluation • Implementation: on top of the Clio data exchange system from IBM Almaden Research Center • Scalable: push computation to the database • Handles relational and XML schema mappings [PVMHF02] • Testbed: • Created relational and XML schema mappings based on the TPCH schema • Created schema mappings based on Mondial, DBLP and Amalgam schemas • Methodology - measured the influence of: • The sizes of I, J and Js • The complexity of st[t • i.e., the number of tgds and the number of atoms in each tgd • Setup: P4 2.8GHz, 2Gb RAM, 256MB DB2 buffer pool • Our regret: No benchmark to base our comparisons

  29. ComputeOneRoute with Rel. schema mappingInfluence of the Sizes of I and J

  30. ComputeOneRoute with Rel. schema mappingInfluence of the Complexity of st[t

  31. ComputeOneRoute vs. ComputeAllRoutes

  32. Experimental results with Mondial, DBLP and Amalgam

  33. Experimental results with Mondial, DBLP and Amalgam • Two DBLP schemas and datasets, both XML: • DBLP1, DBLP2 • First relational schema from Amalgam test suite

  34. Experimental results with Mondial, DBLP and Amalgam • Two DBLP schemas and datasets, both XML: • DBLP1, DBLP2 • First relational schema from Amalgam test suite • Two Mondial schemas and datasets: • one relational (Mondial1), the other XML (Mondial2) • Designed st and used the foreign key constraints as t

  35. Experimental results with Mondial, DBLP and Amalgam • Compute one route: under 3 seconds for 1-10 randomly selected tuples • Compute all routes: can take much longer • 18 seconds to construct the route forest for 10 selected tuples in the target instance of Mondial • Compute one route took under 1 second

  36. Conclusions • Debugging schema mappings with routes • Complete, polynomial time algorithms for computing routes • Extension for routes for selected source data • Routes have declarative semantics, based on the logical satisfaction of tgds • What we don’t do: illustrate data merging • Future work: • Illustrate grouping semantics for nested schema mappings • Adapt target instance to changes in the schema mapping and data sources

  37. Compute one/all routes Alternative routes Guided computation of routes Standard debugging features Breakpoints “Watch” windows Schema-level routes SPIDER: A Schema Mappings Debugger Demo group B Today 14:00-15:30 Thursday 11:00-12:30

  38. Thank you!

More Related