170 likes | 508 Views
Decidable Containment of Recursive Queries. Diego Calvanese, Giuseppe De Giacomo, Moshe Y. Vardi presented by Axel Polleres http://www.dis.uniroma1.it/pub/calvanes/calv-degi-vard-ICDT-2003.pdf. Query Containment.
E N D
Decidable Containment of Recursive Queries Diego Calvanese, Giuseppe De Giacomo, Moshe Y. Vardi presented by Axel Polleres http://www.dis.uniroma1.it/pub/calvanes/calv-degi-vard-ICDT-2003.pdf
Query Containment • Checking whether one query yields necessarily a subset of the result of another one for every database • Important for information integration, query rewriting, verification, information integration, cooperative answering, integrity checking, etc.
Conjunctive Queries vs. full Datalog • A conjunctive query is a query of the form: ans(X0) :- r1(X1), r1(X1), …, rn(Xn). where the Xi = (x1i, …, xni) range over a set of variables {u1, …, uk} and the variables in X0are called distinguished variables. In SQL often called S(elect)P(roject)J(oin)-Queries Containment of conjunctive queries is decidable! In fact, NP-complete: [14] Proof Sketch (membership in NP): A conj. Query Q1 is contained in Q2 iff there is a containment mapping from (the variables in) Q2 to (the variables in) Q1. Guessing and checking that homomorphism is clearly in NP. Also completeness can be shown (e.g. by reduction of “exact cover”, cf. [])
Full Datalog vs CQ: • Full Datalog add Union and Recursion to CQ Containment is undecidable • Undecidability can be shown by reduction from containment for context free grammars [22] So, CQ and Full Datalog span two extremes But …not all is lost! There are interesting classes in between!
n 2 2 Decidable containment Problems: • Containment Monadic Datalog (all rule heads use a single variable) is decidable • Checking containment of full Datalog in non-recursive Datalog is decidable in exponential time • Checking containment of non-recursive Datalog in full Datalog is decidable in triple exponential time , i.e. O( 2 ) • When the non-recursive query is unfolded then “only” double exponential.
In this paper:Regular Path Queries: • Query containment in the context of conceptual graphs (e.g. RDF-graphs), namely for Regular Path Queries, i.e.: • Asking for all pairs of objects in a graph that are connected by a path conforming to a regular expression: i.,e.: E(x,y) … where E is a regular expression over graph edges Refinement: - 2RPQs: “inverse” is allowed in traversal of
UC2RPQs: A conjunctive 2-way regular path query (C2RPQ) of arity n is a query of the form: where are 2RPQs. UC2RQPs are then unions of conjunctive 2-way regular path queries (C2RPQs) with the same arity. Here, the answer set to Note that CQs (with only binary body predicates) are just a special case of 2RPQs!
Containment of Datalog in a UC2RPQ: We define for a datalog program Π, an IDB predicate Q and a database (EDB predicates) G: i.e. the set of facts Q (fixpoint) which can be obtained by applications of rules inΠ, then:
Containment of Datalog in a Unions of Conjunctive queries: • Idea: Reuse of variables is allowed, as long as the variables are not “connected” in the tree. So, we can build proof trees with a bound number of variables by twice the number of the maximum of variables occurring in IDB atoms num_var(r) in rules r of Π = num_var(Π). • A proof tree is then simply an expansion tree only using variables from {x1,…,xnum_var(Π)}
Containment of Datalog in a Unions of Conjunctive queries: Approach: the notion of a containment mapping is generalized to Datalog and to UC2RPQs by expansions of Datalog programs: can be defined via an infinite sequence of conjunctive queries: Let trees(Q, Π) be the set of trees for predicate Q labeled with a Rule at each node, such that the children of a node N always are labeled with rules having as head atoms corresponding to the IDB atoms of the rule of N and leaves are rules labeled with rules having EDB predicates only in their bodies. Note that trees(Q, Π) can be infinite. Intuition: Πis contained in a union of conjunctive queries if there is a containment mapping from some to each expansion tree in trees(Q, Π). … not yet, since the number of variables and hence the number of node labels is unbounded.
Connected variables in proof trees: • To reconstruct an expansion tree for a gicen proof tree, we need to distinguish among occurrences of variables: • Let g1, g2 be nodes in a proof tree, then we call occurrences x1, x2 of variable x in the rules labeling g1, and g2, respectively connected if every rule on the path from g1 to g2 (except maybe the lowest common ancestor g0) has an occurrence of x in the head. • We say that an occurrence x of a variable xin τis a distinguished occurrenceif it is connected to an occurrence of xin the head of the root of τ.
Containment of Datalog in a Unions of Conjunctive queries: A strong containment mapping from a conjunctive query ϕto a proof tree τ is acontainment mapping hfrom ϕto τwith: – hmaps distinguished occurrences in ϕto distinguished occurrences in τ, and – if x1 and x2 are two occurrences of a variable xin ϕ, then the occurrences h(x1) and h(x2) in τ are connected. Then:
This can be similarly exploited for C2RPQS An expansion of a C2RPQ is a CQ of the form:
In the rest of the paper… The authors show how to check this condition using tree-automata: Idea: The set of proof trees for a Datalog program Π with a goal predicate Q can be described by a nondeterministic tree automaton (doubly exponential in the size of Π), accepting exactly the proof trees. … concluding:
Conclusions • Adding transitive to CQ closure does not increase upper-bound-results for containment of Datalog (2EXP matches the upper bound for containment in unions of conjunctive queries) [25] • However whether this upper-bound is tight is not clear, but conjectured by the authors • (lower bound EXPSPACE follows from containment of UC2RPQs in UC2RPQs [34]) • Observe: Containment in the other direction already undecidable for RPQs [22]
Questions/Interesting for WSMO/L • How do te proof obligations we need relate to RPQs/2RPQs/UC2RPQs • How do RPQs/2RPQs/UC2RPQs relate to OWL DL/Light/Flight and rule extensions thereof? • Decidable yes, but (hardly) scalable, or no? Not necessarily if queries/programs are of moderate size. • We need more use cases to show what kinds of containment we need!