270 likes | 392 Views
Composing Mappings among Data Sources. Jayant Madhavan Alon Halevy University of Washington. Mappings in data sharing architectures. Data Integration System Sources with mappings to a single mediated schema …, [Lenzerini, PODS ’02]. Mediated Schema. ACM. DBLP. CiteSeer.
E N D
Composing Mappings among Data Sources Jayant Madhavan Alon Halevy University of Washington
Mappings in data sharing architectures Data Integration System • Sources with mappings to a single mediated schema …, [Lenzerini, PODS ’02] Mediated Schema ACM DBLP CiteSeer Humboldt • Peer Data Management System • Network of pair-wise mappings • [Piazza, UW], [Hyperion, Toronto], [PeerDB, Singapore], [LRM, Trento], [Edutella, Hannover], [Semantic Gossiping, EPFL], [Raccoon, Irvine], [Orchestra, Penn] ACM DBLP UW CiteSeer Composing Mappings among Data Sources
Peer Schema RUW u1(x,y), … RCiteseer c1(x,y,z), … u1(x,y), u2(y,z) c1(x,y,z) … Mapping Mapping Formula (Q1 Q2) Peer Data Management System (Piazza) Humboldt DBLP ACM UW CiteSeer Composing Mappings among Data Sources
Mapping Composition Humboldt ACM DBLP UW CiteSeer Composing Mappings among Data Sources
Q3 Q’ Q2 Q5 Q1 Q4 Q Query Answering Humboldt • Iterative rewriting by chaining mappings • Transitive closure of all relevant mappings ACM DBLP UW CiteSeer • Eliminating redundancies from rewritings (optimization) • [Piazza: ICDE’03, WWW’03, VLDBJ’03] Composing Mappings among Data Sources
Optimization • Pre-compute or compose paths to relevant peers Composition in a PDMS Potential inefficiency • Expensive rewriting + optimization at runtime for each query Humboldt Q5 ACM DBLP Q1 Q’ Q3 UW CiteSeer Q • But,composition must be independent of Q • Side-benefit: robustness to information loss • Dead intermediate peers will not semantically partition the network Composing Mappings among Data Sources
Composition: Meta-data operation • Mappings are integral to all data sharing architectures • Message passing • Data exchange [Fagin, Kolaitis & Miller, ICDT’03] • … Composition is a natural problem in many of these • Fundamental operator to meta-data management • Model Management operators: Match, Merge, Compose, … • [Bernstein, Halevy, Pottinger, SIGMOD Record ’01] • [Melnik, Bernstein, Rahm, SIGMOD ’03] • Formal treatment for a particular mapping language Composing Mappings among Data Sources
Q MAC a1,…,am Relational Schema Problem Definition Q’A1 Qc1 … Q’Ak Qck MAC MAB MBCw.r.t. Query LanguageL For all queries QL, Q given MAC = Q given MAB,MBC MAB MBC A B C QA1 QB1 … QAn QBn GLAV formulas Composing Mappings among Data Sources
Overview of Contributions • Surprising: Composition of finite mappings can be infinite!!! • Good news: Composition computable for powerful practical query languages • CQk finite, or infinite but encoded finitely • Composition algorithm that • on termination computes all the formulas in the composition • terminates if composition finite, and also for many infinite compositions • Rewriting algorithm to exploit infinite formulas • Extension of results from answering queries using views • Complexity results Composing Mappings among Data Sources
Outline • Composition is interesting and important • Problem definition and results overview • Finite and infinite composition • Results and Composition Algorithm • Summary, current and future work Composing Mappings among Data Sources
MAB MBC A B C bbc(x,y) b(x,y) bbba(x,y) Composition Example Graph G MAB: bbba(x,y) b(x,t1), b(t1,t2), b(t2,y) MBC: b(x,t), b(t,y) bbc(x,y) Composing Mappings among Data Sources
MAB MBC A B C bbc(x,y) b(x,y) bbba(x,y) MAB b(x,t1), b(t1,t2), b(t2,x1) MBC bbc(x,y1), bbc(y1,y2) x y1 y2 Composition Example (2) Q x Q(x) :- bbba(x,x1) bbba(x,x1) bbc(x,y1), bbc(y1,y2) Composing Mappings among Data Sources
MAB MBC A B C bbc(x,y) b(x,y) bbba(x,y) bbba(x1,y) bbc(y1,y2), bbc(y2,y) MAC y bbba(x,x1), bbba(x1,y) bbc(x,y1), bbc(y1,y2), bbc(y2,y) x y Composition Example (3) bbba(x,x1) bbc(x,y1), bbc(y1,y2) Composing Mappings among Data Sources
MAB MBC A B C rbc(x,y) bbc(x,y) r(x,y) b(x,y) rbba(x,y) bba(x,y) x t1 t2 y x t y x t y x t y Infinite composition MAB rbba(x,y)r(x,t1),b(t1,t2), b(t2,y) bba(x,y)b(x,t), b(t,y) Graph G MBC r(x,t),b(t,y) rbc(x,y) b(x,t), b(t,y) bbc(x,y) Composing Mappings among Data Sources
x x Infinite Composition (2) MAC MAB MBC A B C rbc(x,y) bbc(x,y) r(x,y) b(x,y) rbba(x,y) bba(x,y) Q(x) :- rbba(x,x1) MAB r(x,t1), b(t1,t2), b(t2,x1) MBC rbba(x,x1) rbc(x,y1), bbc(y1,y2) rbc(x,y1), bbc(y1,y2) Composing Mappings among Data Sources
X 2n x 2n+1 x Infinite Composition (3) rbba(x,x1) rbc(x,y1), bbc(y1,y2) bba(x,y) bbc(x,y) rbba(x,x1), bba(x1,x2) rbc(x,y1), bbc(y1,y2), bbc(y2,y3) Composing Mappings among Data Sources
Main Result Composition computable for interesting query languages • CQk : queries with localized variable interactions • Includes most queries in practice, e.g. cyclen(x) CQ3 cyclen(x) :- b(x,y), pathn-1(y,x) pathn-1(x,y) :- b(x,z), pathn-2(z,y) … path1(x,y) :- b(x,y) Composition w.r.t CQk is computable and is either a finite number of GLAV formulas, or finite encoding of infinite GLAV formulas Composing Mappings among Data Sources
Composition Algorithm • Minimal formulas • Formulas that have to be present in the composition • Larger minimal formulas are extensions of smaller ones • Residues of minimal formulas • Signatures that capture information on extensions • Isomorphic residues isomorphic extensions • Query Rewrite Graphs • Encoding of all minimal formulas in the composition • Cycles can be used to encode infinite number of formulas Composing Mappings among Data Sources
x x1 x1 x2 Join variable x u1 y1 y1 u2 y2 y2 u3 y3 Internally existential variable Not visible on right side Minimal Mapping Formulas Formulas that cannot be constructed from smaller formulas rbba(x,x1), bba(x1,x2) rbc(x,y1), bbc(y1,y2), bbc(y2,y3) Theorem: Sufficient to compute all minimal formulas Composing Mappings among Data Sources
Incremental algorithm … Q’A1 Q’A1 Q’C1 QA QC Q’Ai X Q’Am Try all one atom extensions Complete formulas Incremental Construction • Lemma: If QA QC is a minimal formula • minimal formula Q’A Q’C • QA’ has one atom less than QA rbba(x,x1), bba(x1,x2) rbc(x,y1), bbc(y1,y2), bbc(y2,y3) rbba(x,x1) rbc(x,y1), bbc(y1,y2) Composing Mappings among Data Sources
y2 x1 u3 y3 x2 Potential Join variable Internally existential variable b(u2,y2), {u2}, {y2}, {x1u2} Residue Residues in Formulas rbba(x,x1) rbc(x,y1), bbc(y1,y2) Residues capture all extension information Null residues No extensions x x1 x u1 y1 u2 y2 Composing Mappings among Data Sources
rbba(x,x1), bba(x1,x2) rbc(x,y1), bbc(y1,y2), bbc(y2,y3) x x1 x2 x u1 y1 u2 y2 u3 y3 isomorphic Isomorphic Residues Isomorphic residues Isomorphic extensions rbba(x,x1) rbc(x,y1), bbc(y1,y2) x x1 x u1 y1 u2 y2 Composing Mappings among Data Sources
Query Nodes rbc(x,y1),bbc(y1,y2) R2 bbc(x,y) R1 bba(x1,x2) Q3 Rewrite Nodes bbc(y2,y1) R3 Query Rewrite Graphs • Paths from roots encode minimal mapping formulas • Cycles encode infinite formulas rbba(x,x1) Q2 bba(x,y) Q1 Theorem: QRG construction on termination encodes the composition Composing Mappings among Data Sources
Other Results • Algorithm to exploit infinite formulas • Cyclic QRG can be represented by a pair of recursive datalog programs • Extension of earlier results in answering queries using infinite views [Levy, Rajaraman & Ullman, PODS’96] • Complexity Results • Upper-bound: composition verification is in • Lower-bound: composition verification w.r.t. finite sized query languages is -hard Composing Mappings among Data Sources
Related Work • GLAV • [Millstein, Friedman & Halevy, AAAI’99], [Lenzerini, PODS’02], [Fagin, Kolaitis & Miller, ICDT’03] • Generalization of LaV and GaV • Leads to infinite composition • Reasoning w.r.t. Query Languages • View containment [Li, Ullman & Bawa, ICDT’01] • Makes the problem hard Composing Mappings among Data Sources
Summary Q MAC • Mapping composition • Can be infinite for simple GLAV mappings • Can be constructed completely for interesting query languages • QRG encodes valid formulas in composition • QRG can also encode infinite formulas • Can be exploited for query answering even when infinite MAB MBC A B C Composing Mappings among Data Sources
Current and Future Work • Composition in a PDMS • Choosing paths to pre-compute • Manipulating infinite compositions • Semi-automatic construction of mappings • Learning from a corpus of related schemas • Exploiting past mapping experience • [Halevy, Madhavan & Bernstein, DeBull’03 to appear] • [Madhavan, Bernstein, Chen, Halevy & Shenoy, IIW@IJCAI ’03] More information: http://www.cs.washington.edu/homes/jayant Composing Mappings among Data Sources