180 likes | 302 Views
XPath, the best known modal logic ever. And . . . made in Amsterdam! Maarten Marx Information and Language Processing Systems (ILPS) Informatics Institute, University of Amsterdam, The Netherlands. XPath, what is that? • A standard language proposed by the W3C in November 1999.
E N D
XPath, the best known modal logic ever. And . . . made in Amsterdam! Maarten Marx Information and Language Processing Systems (ILPS) Informatics Institute, University of Amsterdam, The Netherlands
XPath, what is that? • A standard language proposed by the W3C in November 1999. • XPath is a language for addressing parts of an XML document. • XPath beats temporal logic as the best known modal logic: • Google: Resultaten 1 - 10 van circa 1.870.000 voor XPath • Google: Resultaten 1 - 10 van circa 242.000 voor ”temporal logic”
Research aim behind this talk • Create an expressively complete navigational query language for XML documents. Aim of this talk • Show that modal logic is the right paradigm for such a task. • One can get remarkable results with (for modal logicians) simple proofs. • The modal logic literature is full of hints and almost-results.
FO logic of “XML documents” • • An XML document can be seen as a finite, node labeled, sibling ordered, unbounded tree. • • Nb. We abstract away from the “data details” and only focus on the skeleton of an XML • document. • • The first order language for these models has • descendant relation • 2. following sibling relation • 3. unary predicates corresponding to node labels and attribute–value • pairs. • • Nb. we cannot express joins on attribute values!
Known results • For binary relations, very little is known. Immerman Kozen: 1. strings have the 3 variable property; 2. bounded trees have a k variable property. • For unary relations, more is known: 1. Kamp’s theorem, strings have H-dimension 3; 2. unbounded unordered trees have no finite H-dimension (Schlingloff) 3. unbounded ordered trees have H-dimension 3 (PODS 2004). • Nb. 1. k-variable property is stronger than H-dimension; 2. k-variable property is independent from “finite complete set of operators property” (Hodkinson–Simon, JPhL). • Thus we cannot answer our research goal by known results.
Conditional XPath • The syntax is based on • XPath 1.0 (W3C) • 2. Kleene algebras (= regular path queries) with tests (Kozen) • 3. Propositional Dynamic Logic (Pratt, Harel) • step ::= child | parent | right | left. • path wff ::= step | (step?node wff) + • ?node wff • |path wff/path wff | path wff [ path wff. • node wff ::= p | hh path wff ii | ¬ node wff | node wff ^ node wff.
Semantics • Given an ordered tree, – each path wff denotes a set of pairs of nodes, and – each node wff denotes a set of nodes. • All set theoretical operations have their standard meaning. • hh p wff ii is true at a node n iff n is in the domain of the relation p wff. Note! Every path wff (node wff) defines a first order definable binary (unary) relation.
Example expressions child :: pi child/?pi child :: pi[descendant :: ] child/?pi/? hh child + ii /descendant :: pi ? ¬hh parent ii /child + /?pi child :: child self :: pi[child] ?(pi ^ hh child ii ) preceding :: pi parent/left + /child/?pi. Equivalent XPath 1.0 and Conditional XPath expressions.
Conditional XPath fulfills our research goal • Theorem 1 (Kamp/PODS 2004) Every FO definable set of nodes is definable by a Conditional XPath node wff. • Theorem 2 Every first order definable binary relation is definable by a Conditional XPath path wff. • Corollary Every FO relation °(x1, . . . , xn) is equivalent to a union of conjunctive queries consisting of atoms of the form xi path wff xj.
Difference between the two theorems • Theorem 1 is about node wffs and unary relations. Theorem 2 about path wffs and binary relations. • Theorem 2 implies theorem 1, but not conversely. • Node wffs have much stronger operators (and, not, bounded quantification). • Path wffs only have “until”, concatenation and union.
XML document An XML document can be seen as a finite, node labelled, sibling ordered unbounded tree. (Nb. We abstract away from the “data details” and only focus on the skeleton of an XML document.)
Design Constraints • Stay as close as possible to the existing W3C standard XPath. • This means: – no (first or second order) variables. – express sets of nodes (answer sets) and relations between nodes (paths). – relations should be “drawable” (use only the regular expression operators)
Navigational XPath We can give W3C XPath 1.0 a PDL like definition: step ::= child | parent | right | left. path wff ::= step | step + | ?node wff | path wff ; path wff | path wff [ path wff. node wff ::= p | hh path wff ii | ¬ node wff | node wff ^ node wff. • Note the very restricted use of ( · ) + ! • We use hh path wff ii to mean “I start a path wff”. Modal Logic, XPath and XML , ten Cate Workshop, Februari, 2005. 8
Examples of Navigational XPath expressions • ¬hh parent ii • (2) ¬hh child ii • (3) hh child ;?first ; right ;?last ii • (4) root ^ ¬hh child ;?( ¬ leaf ^ ¬hh child ;?first ; right ;?last ii ) ii • Not expressible in this version of XPath are • (child ;?q) ; child ;?p until p, q holds (as a relation) • (child ; child) ;?leaf the relation of being an even number • of steps above a leaf • Modal Logic, XPath and XML , ten Cate Workshop, Februari, 2005. 9
Results (from the modal logic literature) • The node wffs form a modal language, created by Blackburn, Meyer-Viol, de Rijke in the 90’s. The logic is finitely axiomatizable. • 2. SAT problem is hard for EXPTIME (from Fisher-Ladner 79 for PDL). • 3. SAT problem is decidable in EXPTIME (from Vardi, Wolper 86 for PDL with converse). • 4. This language is expressively complete w.r.t. first order logic in two variables (use the result for the line of Etessami, Vardi, Wilke 97). • 5. Cf. ACM SIGMOD Record March 2005.
Conclusion • The W3C standard is a well designed language. They have reinvented a wheel which has been shown to possess very good properties. • Still, the expressive completeness is not completely satisfactory. (Note that W3C XPath is not complete for two variable FO for paths.) • Conditional XPath is excellent for expressing first order queries. • Implementing the conditional axis is still open (special staircase joins?)