1 / 22

Spatial tree logics to reason about Semistructured Data

SEBD 2003. Spatial tree logics to reason about Semistructured Data. Speaker: Giovanni Conforti Joint work with: Giorgio Ghelli. Dipartimento di Informatica – Università di Pisa. What I’m going to talk about …. A gentle introduction to Spatial Tree Logics (STL)

idana
Download Presentation

Spatial tree logics to reason about Semistructured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SEBD 2003 Spatial tree logics to reason about Semistructured Data Speaker: Giovanni Conforti Joint work with: Giorgio Ghelli Dipartimento di Informatica – Università di Pisa

  2. What I’m going to talk about … • A gentle introduction to Spatial Tree Logics (STL) • STL and Semistructured Data (SSD) • Properties of SSD (Constraints, Types, Queries)  Spatial Tree Logic (STL) Formulas • Decision Problems for SSD  Validity/Satisfiability of STL Formulas • Presentation of a decidable fragment of the TQL logic

  3. Background: Spatial Logics • Modal Logics to describe properties of structured worlds • Many Applications: Ambient Calculus, -calculus, tree structured data, shared data structures, … • Spatial (and temporal) modal operators to describe structure (and behavior) • Equivalence, model checking and validity problem are already studied for many spatial logics • Many works involving Cardelli, Gordon, Caires, Ghelli, Gardner, …

  4. A Simple Ground Spatial Tree Logic • Worlds = Information trees : Unordered (multisets of) labeled trees F,F’ ::= 0 (empty root) | n[F] (an edge labelled n leading to the i.t. F) | F | F (the i.t. F “next to” the i.t F’) • Logic = propositional logic connectives + modal operators describing the structure A,B :: = True | Not A | A and B 0| n[A] | A | B

  5. An information tree: a tree labelled book with 3 subtrees F |= A F |= B F |= C F |= D Some formulas describing trees Examples F = book[ title[Databases[0]] | author[Ghelli[0]] | author[Albano[0]] ] A = book[ author[Ghelli[0]]] B = book[ author[Ghelli[0]] | True] C = book[ Not (editor[True] | True) ] D = book[ title[True] And author[True] ]

  6. First order and modal recursion • The full TQL logic extends the ground fragment with: • X tree variables • x[A] locations with label variables • Exists x. A quantification over labels (and trees) • μξ. A fixpoint (ξ positive in A)

  7. Decision Problems Given a formula A and a model F • Model checking: F |= A ? • Query Answering: find values of x such that F |= A(x) • Satisfiability sat(A): Exists a F’ such that F’ |= A ? • Validity vld(A): is true that For each model F’, F’ |= A ? • Negation in the logic: Sat(A)  Not vld(Not A) • ImplicationF. F|=A implies F|=B  vld(Not A Or B) With the simple ground STL all these problems are decidable, but that is not true for satisfiability/validity if we introduce variables and quantification (or fixpoint)

  8. A SSD Data model: labeled trees information trees • articles[ • article[ • author[Cardelli] | • author[Gordon] | • title [Anywhere] | • date[Apr, 2000] ] • article[ • author[Ghelli] | • title[TQL] | • conf[ETAPS] | • date[ • month[Feb] | • year[2001] ] ] • ] articles article article title date author author date author … … … … Ghelli year Cardelli Apr, 2000 month Gordon TQL 2001 Feb

  9. SSD Schema and Types • Schema and Types to constraint the structure of SSD: • DTDs; • XML Schema; • Regular Expression Types; • A schema: Article = article[ title[String],author[String]*,date[True]? ] • A recurisve type: Section = section[ init[String], Section*, conc[String] ]

  10. Types in STL • Regular Type expressions and DTD can be expressed (up to document order) in STL extended with modal recursion • A schema: article[ title[String],author[String]*,date[True]? ] • In STL article[ title[True]| (. 0 Or author[True]|) | date[True] or 0 ]

  11. SSD Constraints • Integrity Constraints on the values of SSD: • Inclusion Constraints; • Inverse Relationship Constraints; • Key Constraints; • path expressions to navigate on SSD: articles.article.title(x) root.section*.init(x) • Integrity constraints as inclusion of paths: student.takes => course.cno student.takes  course.taken_by • Key constraints (first order logic with paths): x,y. article.title(x) And article.title(y) And (x=y) => (x == y)

  12. Constraints in STL • Integrity Constraints over SSD are easily expressed using STL with variables and quantification. • Examples using path abbreviation (.a[A] = a[A] | True): • An inclusion constraint $X. .student.taking[$X] => .course.cno[$X] • A key constraint for SSD: $X.Not (.article.title[$X] | .article.title[$X] ) • Combining quantification with recursion we can express complex types and constraints (e.g. binary trees)

  13. SSD Queries • Many query languages (Xquery, Lorel, Yatl, …), essentially queries are expressions selecting data reachable from paths and constructing new results • TQL a peculiar query language based on spatial tree logic, the selection is done using pattern matching over STL formulas • TQL logic expresses all regular path expressions • Query answering is implemented for the full TQL logic

  14. SSD Decision Problems with STL • Given a data source F, and formulas A representing a schema and B, B’ a set of integrity constraints • Validation: F |= A, F|=B, F|= A And B • Schema/constraint consistency: sat(A), sat(B), sat(A And B) • Constraint Implication (inference): vld(B => B’) • Constraint Implication in presence of a schema: vld(A and B => B’)

  15. A decidable TQL sublogic • STL are good to express types, constraints and queries over SSD but: • Validity in the full TQL logic is undecidable • The gound logic is decidable, but it is not enough to express all interesting types and contraints • We are looking for a decidable fragment of TQL expressive enough to reason about SSD • A first step in this direction is the following logic…

  16. A decidable TQL sublogic A, B ::= True | A and B | Not A| 0 | %[A] | n[A] | A|B We can define useful operators to describe types and constraints in this decidable logic String =def %[0] Tree =def %[True] A or B =def Not (Not A And Not B) A => B =def Not A Or B Aexists =def A | True Aforeach =def Not( Not A | True) AforeachTree =def (Tree => A) foreach Note: if A => Tree we can use AforeachTree to express A*

  17. Conclusions and Future Directions • STL provide a powerful unified framework for types, constraints, and queries over SSD and XML • This framework is worth of studying, it may lead to: • A good formalization of “SSD reasoning” in terms of model checking and validity • Generalization of results on reasoning about types, constraints • Query Optimization strategies guided by types/constraints • (some) future steps • Extend the decidable logic to express integrity constraints • Modeling ordered trees

  18. Università di Pisa: Ph.D. Proposal Spatial tree logics to reason about Constraints and Types Speaker: Giovanni Conforti Supervisor: Giorgio Ghelli

  19. SSD Query Optimization • TQL pattern clause uses STL formulas… • We can use validated constraints C an types T as information to optimize queries (e.g. static declaration of empty result) • A query from Q |= A select Q’ can be rewritten with from Q |= B select Q’ for each B such that (C and T) => (A <=> B)

  20. Research Plan: pianification • The challenge is ambitious, it must be intended as a long term direction of our work • We address some initial tasks we expect to accomplish: • Comparison of STL with other formalisms for types and constraints • Find a “satisfactory” decidable logic fragment to express types (and constraints) • Write a preliminar formal system for constraint (and type) implication • We plan two stages: • (2nd year) deep study of basic theories (tree automata, modal logics, description logics) and initial tasks investigation • (3rd year) Initial tasks completion and integration of the results in a unified formal framework

  21. Research Plan: directions • Main directions, investigate on: • Expressivity of Spatial Tree Logics (in particular for standard Types and Constraints specifications) • Decidability and complexity of model checking and validity for fragments (or extensions) of TQL logic • Reformulation (or generalization) of known results about reasoning and optimization over SSD • Other interesting directions: • Implementation of a query rewriter guided by constraints and types • Extensions to the logic to model order, data updates, private names

  22. Background: Semi-structured data (SSD) • Semi - Structured Data (SSD) are used to: • model and query web (HTML, XML, …); • store sperimental data; • integrate eterogeneous databases; • … • SSD are: • Self-describing (structure is implicit); • Irregular; • Always in evolution

More Related