390 likes | 545 Views
Answering Tree Pattern Queries Using Views. Laks V.S. Lakshmanan , Hui (Wendy) Wang , and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC Amazon.com. Outline. Motivation Problems Studied Without schema With schema Recursive schemas Related Work
E N D
Answering Tree Pattern Queries Using Views Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC Amazon.com
Outline • Motivation • Problems Studied • Without schema • With schema • Recursive schemas • Related Work • Summary & Future Work
Motivation 1/3 • Integration of existing data sources. • Local as view (LAV) – one of the well-known approaches. • Each source = a materialized view over some global database. • Answer to query over global DB = answer to query using (materialized) views.
Motivation 2/3 <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> Source = View “//Trials//Trial” over some DB containing clinical data – trials, their status, patient data, etc. Consider query Q: //Trials[//Status]//Trial over [unknown] original DB. How can and should we answer it using above source?
? ? Motivation 3/3 <PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab> One possible original DB //Trials//Trial <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
? ? Motivation 3/3 Q: //Trials[//Status]//Trial <PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab> One possible original DB //Trials//Trial <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
Motivation 3/3 <PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab> One possible original DB Contained rewriting ◦ “●[//Status]” { (3) } //Trials//Trial <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
Problems Studied 1/3 • Equivalent Rewriting: Given Q and views V, find an equivalent rewriting of Q using V, i.e., an expression E s.t. V◦E Ξ Q, over all possible input DBs. • Appropriate for query optimization. • Contained Rewriting: Given Q and V, find an expression E s.t. V◦E Q overall all possible input DBs, and V◦E is maximal among all such rewritings. • Most appropriate for information integration [Halevy, Lenzerini, Pottinger & Halevy].
Problems Studied 2/3 • No Schema: Given Q and V, find a maximally contained rewriting (MCR) of Q using V. • With Schema: Given Q and V, and a schema prescribing possible input DBs, find a maximally contained rewriting of Q using V. • Focus: Tree Pattern Queries (XP/,//, [ ]). Schema without cycles, union, and recursion.
//a //a[//b]/c b c Problems Studied 3/3 • Given Q & V: • RΞ V ◦ E Q. Rewriting query Compensation query • Want MCR in the absence and in the presence of a schema.
/b d Q1 Without Schema 1/6 • Question 1: Does an MCR always exist? /a /a No MCR for Q1 and for Q2. What went wrong? b b c d Q2 V distinguished (answer) node
//Trials Status Patient Without Schema 2/6 (1) f //Trials //Trials (1) V Trial Trial (2) (3) E V Q Status Patient (2) (3) Unfulfilled obligations Clip Away Tree (CAT) f – useful embedding
Without Schema 3/6 Theorem: Q, V – tree pattern queries. Then Q is answerable using V iff there is a useful embedding from Q to V. Testing Existence of MCR: //a 1 1,2 //a a 2 1:{2}, 2:{} 1:{2,3}, 2:{3} a a b 6 a b 2:{6} 2:{6}, 3:{4} 3 b b c 6:{7} c 4 7 c 4:{5}, 6:{7} Q 5 c V d e
Without Schema 4/6 Two embeddings – corresponding irredundant CRs. //a a //a b a a b b c a need for expressing MCR! b c a e c b c d e c d
Without Schema 5/6 • Can test existence of MCR in poly time. • However, MCRs can be exponentially large (closure issue). //a //a a a a How many irredundant CRs are possible? b b V b Q c c c d e
Without Schema 5/6 //a //a //a a a a a b b V b b Q c c c c d e d e
Without Schema 5/6 //a //a //a a a a a b b V b b Q c c c c d e a/b/c/e d
Without Schema 5/6 //a //a //a a a a a b b V b b Q c c c c d e a/b e c e
Without Schema 5/6 //a //a //a a a a a b b V b b Q c c c c d e a/b a/b/c/e MCR = union of exponential # CRs in the worst case! c e
Without Schema 6/6 • Summary: • Can test existence of MCR in poly time. • Exact characterization. • MCR may be union of exponentially many CRs in the worst case. • Algorithm for generating MCR.
With Schema 1/6 • Given Query Q, view V, schema S. • Infer all constraints C implied by S. • Chase V w.r.t. C. • Look for MCR of Q w.r.t. chased view.
With Schema 2/6 Auctions * • E.g. constraints: • c_a has ≤ 1 bids • child • Every Auction having • a person desc also has • an item desc. • every path from • Auction to name • goes via bids. Auction * ? open_auction closed_auction + ? bids + + person item name
With Schema 3/6 //Auction //Auction bids bids o_a c_a person item bids bids name Q V
With Schema 3/6 Auctions //Auction * Auction o_a c_a ? * open_auction closed_auction bids bids + ? bids person item + + p i person item name n name
With Schema 4/6 //Auction //Auction o_a c_a bids bids person item bids bids name person item p i Q MCR = identity query. name n
With Schema 5/6 Another Example: Auctions * Auction //Auction * ? closed_auction item name open_auction Q + ? //Auction item bids buyer + person person V name How to answer Q using V?
With Schema 5/6 Another Example: Auctions * Auction //Auction * ? closed_auction item name open_auction Q + ? //Auction item bids buyer + person item person So what’s the compensation query? name name
With Schema 5/6 Another Example: Auctions * Auction //Auction * ? closed_auction item name open_auction Q + ? //Auction item bids buyer + person item person name MCR = V ◦ “●//name” name
With Schema 6/6 • Challenges and Highlights: • Naïve chase can explode. • Make chase context aware. • Exact characterization of schema w/o recursion and union in terms of constraints. • Efficient algo. for inferring the constraints. • Efficient algo. for chase. • And for finding MCR. • MCR is unique, if it exists.
Recursive Schemas 1/2 a //a //a ? b b b b * * V d c c d Q What is the MCR?
Recursive Schemas 2/2 a //a //a ? b b b b * * V d c c d Q //a b c d
Recursive Schemas 2/2 a //a //a ? b b b b * * V d c c d Q //a b b c d
Recursive Schemas 2/2 a //a //a ? b b b b * * V d c c d Q //a b b d c
Recursive Schemas 2/2 a //a //a ? b b b b * * V d c c d Q //a MCR = union of four CRs. Behavior similar to no schema. b b b c d
Related Work 1/2 • QAV for relational – huge body of work [Halevy 01]. • Regular path queries and semi-structured DBs [Grahne&Thomo 03, Calvenese 00,Papakonstantinou&Vassalos 99]. • Equivalent rewrites for fragments of XQuery and XPath [Deutsch&Tannen 03, Tang&Zhou 05, Xu&Ozsoyoglu 05].
Related Work 2/2 • Key differences b/w equivalent & contained rewriting: • Unique rewriting (even w/o schema). • MCR may involve union of (possibly exponentially many) CRs. • Study of contained rewriting in presence of schema. • Lot of work on semantic caching [Chen+ 02], heuristics for using materialized views for optimizing XPath [Balmin+ 04], mine views worth materializing, XPath containment, … .
Summary & Future Work 1/2 • QAV using (maximally) contained rewriting ( information integration). • Without schema: existence, characterization, closure, generation of MCR. • With Schema: extract essence using constraints, chase, similar problems as above. • Impact of recursion. • Experiments.
Summary & Future Work 2/2 • Impact of wildcard, disjunction, order … • Impact of union, recursion, … • Other integration models (e.g., GLAV) • QAV for XQuery.