590 likes | 789 Views
Containment of Nested XML Queries. Presented by: Orly Goren. Xin Dong,. Alon Halevy,. Igor Tatarinov. Query Containment. The most fundamental relationship between a pair of queries Query Q is contained in Q’ if: For any database D, Q(D) is a subset of Q’(D). Roadmap.
E N D
Containment of Nested XML Queries Presented by: Orly Goren Xin Dong, Alon Halevy, Igor Tatarinov
Query Containment • The most fundamental relationship between a pair of queries • Query Q is contained in Q’ if: • For any database D, • Q(D) is a subset of Q’(D)
Roadmap • Introduction and problem definition • Containment of a subset of XML queries • Query containment is decidable • Query containment in practice • Relaxing the assumptions • Conclusions
Applications of Query Containment • Semantic caching • Determining independence of database updates • Query answering using views • Detecting that a reformulated query is redundant • Query minimization • Verification of knowledge bases
MWS Stanford UW MBW MSB MPW UPenn Berkeley QB2 QB1 Query Processing in PDMS • XML Query Containment in Peer Data Management System (PDMS) • Answering queries using views to extract remote data • Removing redundant queries to enhance performance QW QW QW QS QS QB2 QP QB1 QB1
project project member member Alice Bob Example – An XML Instance D: <project> <member>Alice</member> </project> <project> <member>Bob</member> </project>
Q: for $x in /project return <group>{ for $y in$x/member return <name>{ where $y=“Alice” return <Alice/> where $y=“Bob” return <Bob/> }</name> }</group> project project member member Alice Bob group group name name Alice Bob Example – An XML Query D: Q(D):
Q’: for $x in /project return <group>{ for $y in/project/member return <name>{ where $y=“Alice” return <Alice/> where $y=“Bob” return <Bob/> }</name> }</group> project project member member Alice Bob group name name Alice Bob Example – Another XML Query D: Q’(D):
Tree Embedding • Given two trees, a node mappingψfrom T1 to T2 is said to be an embedding from T1 to T2 if: • ψmaps the root of T1 to the root of T2. • If node n2 is a child of node n1 in T1, thenψ(n2) is a child ofψ(n1), and the labels of n1 and n2 has the same labels asψ(n1) andψ(n2). What is the time complexity of finding an embedding from t1 to t2?
XML Instance Containment • Let e and e’ be two XML instances. e is contained in e’, denoted as e e’, if the tree of e can be embedded in the tree of e’. • Containment is reflexive and transitive. • Containment is not antisymmetric: e e’ and e’ e do not imply e = e’. Two XML instances that contain each other but are not equivalent. a a a b b
XML Query Containment • Let Q and Q’ be two XML queries. Q is contained in Q’, denoted as Q Q’, if for every input XML instance D, Q(D) Q’(D).
Q (D) Q’(D) Q’(D): Q(D): X group group group group group group name name name name name name name name Alice Alice Bob Bob Alice Alice Bob Bob Q’(D) Q (D) Example – Tree Embedding and Query Containment Q(D): Q’(D):
Query Containment Problem • From answer containment to query containment • Our problems • Given queries Q and Q’, decide whether Q Q’ • The complexity of query containment Q’(D) Q (D) Q’ Q Q (D) Q’(D) Q Q’
Previous Work (I) • Relational query containment • Conjunctive queries [Chandra and Merlin, STOC 1977] • Acyclic queries [Yannakakis, VLDB 1981] • Queries with union [Sagiv and Yannakakis, JACM 1980] • Queries with negation [Levy and Sagiv, VLDB 1993] • Queries with arithmetic comparisons [Klug, JACM 1988] • Recursive queries[Shmueli, 1993], [Chaudhuri and Vardi, 1992] • Queries over bags [Ioannidis and Ramakrishnan, 1995]
Previous Work (II) • XML query containment – two new challenges • XPath containment • With *, // and […] [Miklau and Suciu, PODS 2002] • With equality testing on tag variables[Deutsch and Tannen, KRDB 2001] • Conjunctive queries over path expressions [Florescu, Levy and Suciu, PODS 1998] • Nested query containment
Containment Cannot be Determined Solely by Comparing XPath Components Q: for $g in /group where $g/gname/text() = “database” return <area>{ for $p in $g/person return <person> <name>{$p/text()}</name> {for $q in $g/paper where $q/author/text() = $p/text() return <paper>{$q/title/text()}</paper>} </person> }</area> Q’: for $g in /group return <area>{ for $p in $g/person return <person> <name>{$p/text()}</name> <group>{$g/gname/text()}</group> {for $q in $g/paper where $q/author/text() = $p/text() return <paper>{$q/title/text()}</paper>} </person> }</area>
Previous Work (II) • XML query containment – two new challenges • XPath containment • With *, // and […] [Miklau and Suciu, PODS 2002] • With equality testing on tag variables[Deutsch and Tannen, KRDB 2001] • Conjunctive queries over path expressions [Florescu, Levy and Suciu, PODS 1998] • Nested query containment • Complex object query containment [Levy and Suciu, PODS 1997] Containment of nested XML queries has not been fully studied
Conjunctive XML Queries (c-XQueries) • Returned variables are bound to tag names or text values only. • Conjunctive – no two sibling query blocks return the same tag • XPath: • HAVE • Child axis (/) • Wildcards (*) • Branches ([…]) • NOT HAVE • descendant // • Arithmetic comparison • Union Here, XPath containment is in PTIME
Conjunctive Queries – cont. • A c-XQuery consists of nested query blocks. • The fan-out of a query block is the number of its immediate sub-blocks. • The nesting depth of a query is 1 plus the maximal nesting depth if its sub-blocks. • The nesting depth of the query is the depth of its outer-most block.
Query Head Tree • The structure of an XML query and its answers can be described using a query head tree. • Edges represents query blocks. The label of the node n in the head tree is the returned tag of the block corresponding to the incoming edge of n in Q . • A head tree is also an XML instance if its variables are substituted with actual values.
Query Head Tree Example: Q: for $x in /project return <group>{ for $s in $x/title/text() return <projtitile>{$s}</projtitle>} { for $t in $x/member/text() return <name>{$t}</name>} </group> Query Head Tree group projtitle s name t What is the fan-out and the nesting depth of Q?
Constant Conjunctive XML Queries (cc-XQueries) • A cc-XQuery is a c-XQuery that does not return tag variables. • The head tree of a cc-XQuery has constant labels only.
Roadmap • Introduction and problem definition • Containment of a subset of XML queries • Query containment is decidable • Query containment in practice • Relaxing the assumptions • Conclusions
Deciding Q Q’? • How to find a property for an infinite number of input XML instances • Standard technique • Find a finite set of input representatives – Canonical Databases • Relational query: each canonical database is a minimal input to generate the answer template • XML query answers have infinite number of shapes • Find a finite set of answer templates – Canonical Answers
group group name group group name name Alice Bob Answer Shapes Determined by the Head Tree Q’: for $x in /project return <group>{ for $y in /project/member return <name>{ where $y=“Alice” return <Alice/> where $y=“Bob” return <Bob/> }</name> }</group> Head Tree: group name Alice Bob
group group name group name name group group Alice name Bob name Alice Bob An Additional Candidate Answer Head Tree: group name Alice Bob
project project member member Alice Bob group group group name name name name Alice Bob Alice Bob Why Consider the Additional Case D: Head Tree: group name Alice Bob Q’(D): Q(D):
What can Serve as Canonical Answers? • Prefix subtrees of the head tree? – necessary but not sufficient Trees contained in the head tree? – necessary and sufficient – but, too many and too complex
A Head Tree can Have Many Trees Contained in it Head Tree: group group name name name name Alice Alice Bob Alice Bob Bob group group group group name name name name name Bob Alice Alice Alice Bob Alice Bob
What can Serve as Canonical Answers? • Prefix subtrees of the head tree? – necessary but not sufficient • Trees contained in the head tree? – necessary and sufficient – but, too many and too complex • Solution: consider only minimal trees that are contained in the head tree
Canonical Answer • A minimal XML instance: No two sibling subtrees where one is contained in the other • Canonical Answer: A minimal XML instance contained in the head tree • Every answer A of query Q corresponds to a unique canonical answer CA, s.t. A CA, CA A group group group name name name name name Bob Alice Alice Bob Alice Alice Bob
project project project member member Alice Bob Canonical Database • Canonical Database: DBCA • The minimal XML instance to generate CA CA: for $x in /project return <group>{ for $y in /project/member return <name>{ where $y=“Alice” return <Alice/> where $y=“Bob” return <Bob/> }</name> }</group> group name name Alice Bob DB:
Canonical Database – Formal Def. • Canonical Database of a cc-XQuery – DBCA. DBCA is an XML instance, s.t. for each node N of CA where N’s generator query block is qn the following holds: Let p0/p1/…pn be a path expression in qn, where p0 is an optional node variable from an ancestor query block. For each pi, i [1,n], there is a distinct node, labeled i, that is a child of the node for pi-1. If p0 is absent, then p1 is a child of DBCA’s root.
Sound and Complete Conditions for Nested Query Containment Let Q and Q’ be two cc-XQueries. The following three conditions are equivalent: 1. Q Q’ 2. For every canonical database DB of Q,Q(DB) Q’(DB) 3. For every canonical answer CA of Q, • CA is a canonical answer of Q’ • DB’CA DBCA
Properties of Canonical Answers and Databases. Lemma 1: Let Q be a cc-XQuery and D be an XML instance. There exist a unique canonical answer CA of Q, s.t. Q(D) CA and CA Q(D). Lemma 2: Let Q be a cc-XQuery, CA be a canonical answer of Q, DBCA be the canonical database for CA of Q, and D be an XML instance. CA Q(D) if only if DBCA D.
Containment of cc-XQueries – Proof (1) 1) => 2) Follows from definition. 2) => 3) CA Q(DBCA) Q(DBCA) Q’(DBCA) CA Q’(DBCA) a) holds. CA is a canonical answer of Q’ (a), CA Q’(DBCA ), DB’CA DBCA b) holds. Lemma 2 2) Containment is transitive Lemma 2
Lemma 1 Lemma 2 3) b) transitive Lemma 2 transitive Containment of cc-XQueries – Proof (2) 3) => 2) To show Q Q’, we need to show for every XML instance D, Q(D) Q’(D). There exists a unique CA of Q, s.t. Q(D) CA and CA Q(D) DBCA D. DB’CA DBCA DB’CA D. CA Q’(D) Q(D) Q’(D).
Query Containment Algorithm • Algorithm: for every canonical answer CA of Q do • check whether CA is a canonical answer of Q’ • generate DBCAand DB’CA • check DB’CA DBCA
Roadmap • Introduction and problem definition • Containment of a subset of XML queries • Query containment is decidable • Query containment in practice • Relaxing the assumptions • Conclusions
Query Containment Algorithm • Algorithm: for every canonical answer CA of Q do • check whether CA is a canonical answer of Q’ • generate DBCAand DB’CA • check DB’CA DBCA • Polynomial in the size and number of canonical answers • What are the sizes of canonical answers? • What is the number of canonical answers?
Containment of XML Queries with Fanout 1 • E.g. d=3 – the depth; m=1 – the maximum fanout • Canonical Answers and Complexity • Number: the depth of the query • Size: bounded by the depth of the query • Complexity: O( d·|Q|·|Q’|) • Theorem: Testing containment of XML Queries with fanout 1 is in PTIME for $x in /project return <group>{for $y in /project/member return <name>{where $y =“Alice” return <Alice/> }</name> }</group> group group group name name Alice Nesting with fanout 1 does not increase complexity
Roadmap • Introduction and problem definition • Containment of a subset of XML queries • Query containment is decidable • Query containment in practice • Relaxing the assumptions • Conclusions
d d d-1 1 3 1 2 3 1 2 2 3 1 2 2 3 3 1 1 2 2 3 2 3 3 1 3 1 1 2 2 Containment of XML Queries with Arbitrary Fanout • E.g. d=4 – the depth; m=3 – the maximum fanout • Canonical Answers Complexity • Number: • Size: • Theorem: Testing containment of XML Queries with depth 2 and arbitrary fanout is coNP-hard
Roadmap • Introduction and problem definition • Containment of a subset of XML queries • Query containment is decidable NOT TIGHT • Query containment in practice • Conclusions
Effect of the Depth on Containment of XML Queries • Insight: Kernel Canonical Answer • The root node has a single child • In any subtree, a path pattern is repeated no more than cd times. d – query depth c – #(maximum path steps in a query block) • The size of kernel canonical answers • Polynomial in the query size (for fixed nesting depth). • Exponential in the query depth (for arbitrary depth). • Theorem: • Testing containment of XML queries with fixed depth is coNP-complete • Testing containment of XML queries with arbitrary depth is in coNEXPTIME
Effect of the Depth on Containment of XML Queries – Cont. • Lemma 3:Let Q and Q’ be two cc-XQueries. Q Q’ iff for each KCA of Q • 1. KCA is a Canonical Answer of Q’. • 2. DB’KCA DBKCA. • The size of a KCA is O(bcd)d • The number of KCA is O(m(bcd)d) • b = #(query blocks in Q). • m = #(maximum fanout in Q).
Effect of the Depth on Containment of XML Queries – Cont. • Lemma 3:Let Q and Q’ be two cc-XQueries. Q Q’ iff for each KCA of Q • 1. KCA is a Canonical Answer of Q’. • 2. DB’KCA DBKCA. • The size of a KCA is O(bcd)d • The number of KCA is O(m(bcd)d) • b = #(query blocks in Q). • m = #(maximum fanout in Q).
Roadmap • Introduction and problem definition • Containment of a subset of XML queries • Query containment is decidable • Query containment in practice • Relaxing the assumptions • Conclusions