280 likes | 386 Views
The Complexity of XPath Evaluation. Paper By: Georg Gottlob Cristoph Koch Reinhard Pichler. Presented By: Royi Ronen. Introduction. All major XPath evaluating algorithms run in exponential time. Paper’s main goals: Prove that the “XPath problem” P-complete.
E N D
The Complexity of XPath Evaluation Paper By: Georg Gottlob Cristoph Koch Reinhard Pichler Presented By: Royi Ronen
Introduction • All major XPath evaluating algorithms run in exponential time. • Paper’s main goals: • Prove that the “XPath problem” P-complete. • Prove that other related problems are LOGCFL-complete.
XPath – Quick Reminder • XPath is a query language for XML documents. • Navigating through a document: /descendant::a/child::b selects nodes named “b” that have a father named “a”. • Testing nodes: /descendant::a/child::b[@c=3] requires that b’s attribute c equals 3.
Sketch: How P-Completeness is proven • In order to prove P-Completeness of a problem, we have to prove: • Membership in P; • P-Hardness; P-Hard P-Complete P
XPath is P-Complete • Sketch: 1. Membership of XPath in P is already proven (By the same authors). 2. P-Hardness of XPath will be proven by reduction from the monotone circuit problem (which is known to be P-Complete) to Core XPath (a subset of XPath with its main features). Why is it enough?
Monotone Boolean Circuit Problem • A Monotone Boolean circuit is a circuit with many inputs and one output that uses the following Boolean gates only: • AND • OR • DUMMY • Given a circuit and its inputs, solving the problem is stating the output. • The problem is P-Complete.
A Monotone Boolean Circuit • Item 3 in the handout:
Core XPath - Definition XPath is has many features, and is inconvenient for theoretical treatment. Therefore Core XPath, a subset of XPath with its main features is defined by the following grammar (Item 1 in the handout): locpath ::= ‘/’ locpath | locpath ‘/’ locpath | locpath ‘|’ locpath | locstep. locstep ::= axis ‘::’ ntst `[' bexpr `]' . . . ‘[‘ bexpr ‘]’. bexpr ::= bexpr ‘and’ bexpr | bexpr ‘or’ bexpr | ‘not(’ bexpr ‘)’ | locpath. axis ::= ‘self’ | ‘child’ | ‘parent’ | ‘descendant’ | ‘descendant-or-self’ | ‘ancestor’ | ‘ancestor-or-self’ ‘following’ | ‘following-sibling’ ‘preceding’ | ‘preceding-sibling’.
The Corresponding Languages • The paper shows direct reductions between the problems. • We will show the same reduction, but between the corresponding languages, since it is the methodology used in the Technion Computability course. • The proofs are equivalent.
The Corresponding Languages • L-Core XPath: {(Q,D) | Q is a Core XPath query, D is a valid document and Q yields a non-empty result when run on D} • L-Monotone Circuit: {(C,I) | C is a monotone circuit, I is a set of inputs to C and C evaluates 1 when run on I}
The Reduction • Reduction is our tool to prove that one language is at least as hard as another. • Here we will show: L-Circuit is reducible to L-Core XPath. It proves that L-Core XPath is at least as hard as L-Circuit, therefore P-Hard. • We have to build (Q,D) that yields a nonempty result iff (C,I) evaluates to 1.
The circuit layered • An equivalent monotone circuit, in which only one non-dummy gate exists in every layer (Item 4 in the handout). • The gates are ordered, data can flow from lower to higher indexed gates only.
Q and D • D is built as follows: M inputs, Here M=4 N non-input gates, Here N=5 Total of 2(M+N)+1 nodes. Nodes are tagged, from the alphabet: {0,1,Ii,Oi,G} Where i is from {1,2,…,N}
Tagging Rules • V1-VM are tagged each with its input value, e.g. 0 or 1. • VM+N Is tagged R, Vi is tagged G (inc. VM+N). • If gate Gi is an input to gate GM+k (i<M+k), Ik is added to Vi and Ok – to VM+k. • V’1..M are tagged Ii and Oi, where i is in {1,..,N}. • V’M+i are tagged Ik and Ok, where k is in {i,..,N}. These tags will be used by the query.
A Simple Example C D G1 V0 G 1 G 0 G 0 O1 1 V3 V2 I1 V1 I1 R V’1 V’2 V’3 I1 I1 I1 O1 O1 O1
The Query • The query in the output of the reduction is: /descendant-or-self::[T(R) and ] := descendant-or-self::[T(Ok) and parent::*[ ]] := not(child::*[T(Ik) and not( )]) Evaluation of Gk by: selecting V0 iff all (one of) Gk inputs are (is) 1 and the gate is “AND” (“OR”). If GM+k is an AND Gate := child::*[T(Ik) and ( )] If GM+k is an OR/DUMMY Gate := ancestor-or-self::*[T(G) and ] Pushing down results := T(1) End of recursion The reduction can be achieved in logarithmic space
Sub-queries Meaning Returns nodes in the previous iteration and their tagged children, e.g. pushes “down” results by including the children. Returns the root iff all the inputs to gate k are true, in an AND gate. Returns the root iff at least one of the inputs to gate k is true, in an OR gate. In both cases, returns the nodes that represent gates that were previously evaluated to true. Includes Vk iff the root was returned by the previous sub-query. Returns the rightmost node iff the output gate is evaluated to true. (No other gate is tagged R).
The Query - Example V0 G O1 1 G 0 G V3 V2 I1 R V1 I1 V’1 V’2 V’3 I1 I1 I1 O1 O1 O1
Discussion It is enough to show that: Reason: T(R) is true for the rightmost node only. If the last gate evaluates to 1, then the result of the query consists of that node, and (Q,D) is in Circuit. Otherwise, the result is empty, and (Q,D) is not in Circuit. Vi [ ]iffGi evaluates to true
Tagged Tree Example For C in the handout I24 1 G I1 0 G O I1 G O5 R G I23 G 1 O1 I34 G I5 O2 G O3 I5 G O4 I5 G I1-I5 O1-O5 I5 O5 I1-I5 O1-O5 I1-I5 O1-O5 I1-I5 O1-O5 I1-I5 O1-O5 I2-I5 O2-O5 I3-I5 O3-O5 I4-I5 O4-O5 and or and and and
Discussion • consists of the values of the k nodes in layer k of the circuit. • It can also be viewed as the situation at the k-th tick of a clock in a synchronous system. • Proof: Vi [ ]iffGi evaluates to true
Despite P-Completeness • Problems that are P-Complete are considered inherently sequential, and thus cannot benefit from parallelization. • However, for real-world use, it may be very useful to find subsets of the problem and classify them into lower complexity classes (easier problems). • Does anyone recall a well known problem that can benefit from such manipulation? • The paper continues by looking for how to degenerate the problem.
First Modification Trial • Only usage of the axes: child, parent and descendant-or-self is allowed. • The modification doesn’t yield lower complexity. The same reduction will work after changing: ancestor-or-self::* to descendant-or-self::*/parent::*
Second Modification Trial • Let Positive Core-XPath be: Core-XPath \ Queries that use negation. • This problem is a member of LOGCFL. • LOGCFL problems can be reduced in logarithmic space to a context free language. • Being context free embodies the ability to be parallelized. Segments do not dependant on each other. • The reduction is very similar. It uses the problem of semi-bounded circuits for the reduction.
WF and Positive WF • WF is a subset of XPath that allows Core-XPath, arithmetic operations and conditions using position() last() and constants. • Where is WF? • Positive WF is LOGCFL-Complete. The proof of hardness resembles the proof we have just seen.
BACKUP • BACKUP
PF is NL-Complete • PF is the problem of navigating through an XML document, with no conditions allowed. • NL is the class of problems solved by a Turing Machine that uses, non-deterministically, logarithmic space. • Proof: PF is NL-Complete. • Membership in NL (By random guessing) • NL-Hardness