220 likes | 353 Views
SEBD 2003. Spatial tree logics to reason about Semistructured Data. Speaker: Giovanni Conforti Joint work with: Giorgio Ghelli. Dipartimento di Informatica – Università di Pisa. What I’m going to talk about …. A gentle introduction to Spatial Tree Logics (STL)
E N D
SEBD 2003 Spatial tree logics to reason about Semistructured Data Speaker: Giovanni Conforti Joint work with: Giorgio Ghelli Dipartimento di Informatica – Università di Pisa
What I’m going to talk about … • A gentle introduction to Spatial Tree Logics (STL) • STL and Semistructured Data (SSD) • Properties of SSD (Constraints, Types, Queries) Spatial Tree Logic (STL) Formulas • Decision Problems for SSD Validity/Satisfiability of STL Formulas • Presentation of a decidable fragment of the TQL logic
Background: Spatial Logics • Modal Logics to describe properties of structured worlds • Many Applications: Ambient Calculus, -calculus, tree structured data, shared data structures, … • Spatial (and temporal) modal operators to describe structure (and behavior) • Equivalence, model checking and validity problem are already studied for many spatial logics • Many works involving Cardelli, Gordon, Caires, Ghelli, Gardner, …
A Simple Ground Spatial Tree Logic • Worlds = Information trees : Unordered (multisets of) labeled trees F,F’ ::= 0 (empty root) | n[F] (an edge labelled n leading to the i.t. F) | F | F (the i.t. F “next to” the i.t F’) • Logic = propositional logic connectives + modal operators describing the structure A,B :: = True | Not A | A and B 0| n[A] | A | B
An information tree: a tree labelled book with 3 subtrees F |= A F |= B F |= C F |= D Some formulas describing trees Examples F = book[ title[Databases[0]] | author[Ghelli[0]] | author[Albano[0]] ] A = book[ author[Ghelli[0]]] B = book[ author[Ghelli[0]] | True] C = book[ Not (editor[True] | True) ] D = book[ title[True] And author[True] ]
First order and modal recursion • The full TQL logic extends the ground fragment with: • X tree variables • x[A] locations with label variables • Exists x. A quantification over labels (and trees) • μξ. A fixpoint (ξ positive in A)
Decision Problems Given a formula A and a model F • Model checking: F |= A ? • Query Answering: find values of x such that F |= A(x) • Satisfiability sat(A): Exists a F’ such that F’ |= A ? • Validity vld(A): is true that For each model F’, F’ |= A ? • Negation in the logic: Sat(A) Not vld(Not A) • ImplicationF. F|=A implies F|=B vld(Not A Or B) With the simple ground STL all these problems are decidable, but that is not true for satisfiability/validity if we introduce variables and quantification (or fixpoint)
A SSD Data model: labeled trees information trees • articles[ • article[ • author[Cardelli] | • author[Gordon] | • title [Anywhere] | • date[Apr, 2000] ] • article[ • author[Ghelli] | • title[TQL] | • conf[ETAPS] | • date[ • month[Feb] | • year[2001] ] ] • ] articles article article title date author author date author … … … … Ghelli year Cardelli Apr, 2000 month Gordon TQL 2001 Feb
SSD Schema and Types • Schema and Types to constraint the structure of SSD: • DTDs; • XML Schema; • Regular Expression Types; • A schema: Article = article[ title[String],author[String]*,date[True]? ] • A recurisve type: Section = section[ init[String], Section*, conc[String] ]
Types in STL • Regular Type expressions and DTD can be expressed (up to document order) in STL extended with modal recursion • A schema: article[ title[String],author[String]*,date[True]? ] • In STL article[ title[True]| (. 0 Or author[True]|) | date[True] or 0 ]
SSD Constraints • Integrity Constraints on the values of SSD: • Inclusion Constraints; • Inverse Relationship Constraints; • Key Constraints; • path expressions to navigate on SSD: articles.article.title(x) root.section*.init(x) • Integrity constraints as inclusion of paths: student.takes => course.cno student.takes course.taken_by • Key constraints (first order logic with paths): x,y. article.title(x) And article.title(y) And (x=y) => (x == y)
Constraints in STL • Integrity Constraints over SSD are easily expressed using STL with variables and quantification. • Examples using path abbreviation (.a[A] = a[A] | True): • An inclusion constraint $X. .student.taking[$X] => .course.cno[$X] • A key constraint for SSD: $X.Not (.article.title[$X] | .article.title[$X] ) • Combining quantification with recursion we can express complex types and constraints (e.g. binary trees)
SSD Queries • Many query languages (Xquery, Lorel, Yatl, …), essentially queries are expressions selecting data reachable from paths and constructing new results • TQL a peculiar query language based on spatial tree logic, the selection is done using pattern matching over STL formulas • TQL logic expresses all regular path expressions • Query answering is implemented for the full TQL logic
SSD Decision Problems with STL • Given a data source F, and formulas A representing a schema and B, B’ a set of integrity constraints • Validation: F |= A, F|=B, F|= A And B • Schema/constraint consistency: sat(A), sat(B), sat(A And B) • Constraint Implication (inference): vld(B => B’) • Constraint Implication in presence of a schema: vld(A and B => B’)
A decidable TQL sublogic • STL are good to express types, constraints and queries over SSD but: • Validity in the full TQL logic is undecidable • The gound logic is decidable, but it is not enough to express all interesting types and contraints • We are looking for a decidable fragment of TQL expressive enough to reason about SSD • A first step in this direction is the following logic…
A decidable TQL sublogic A, B ::= True | A and B | Not A| 0 | %[A] | n[A] | A|B We can define useful operators to describe types and constraints in this decidable logic String =def %[0] Tree =def %[True] A or B =def Not (Not A And Not B) A => B =def Not A Or B Aexists =def A | True Aforeach =def Not( Not A | True) AforeachTree =def (Tree => A) foreach Note: if A => Tree we can use AforeachTree to express A*
Conclusions and Future Directions • STL provide a powerful unified framework for types, constraints, and queries over SSD and XML • This framework is worth of studying, it may lead to: • A good formalization of “SSD reasoning” in terms of model checking and validity • Generalization of results on reasoning about types, constraints • Query Optimization strategies guided by types/constraints • (some) future steps • Extend the decidable logic to express integrity constraints • Modeling ordered trees
Università di Pisa: Ph.D. Proposal Spatial tree logics to reason about Constraints and Types Speaker: Giovanni Conforti Supervisor: Giorgio Ghelli
SSD Query Optimization • TQL pattern clause uses STL formulas… • We can use validated constraints C an types T as information to optimize queries (e.g. static declaration of empty result) • A query from Q |= A select Q’ can be rewritten with from Q |= B select Q’ for each B such that (C and T) => (A <=> B)
Research Plan: pianification • The challenge is ambitious, it must be intended as a long term direction of our work • We address some initial tasks we expect to accomplish: • Comparison of STL with other formalisms for types and constraints • Find a “satisfactory” decidable logic fragment to express types (and constraints) • Write a preliminar formal system for constraint (and type) implication • We plan two stages: • (2nd year) deep study of basic theories (tree automata, modal logics, description logics) and initial tasks investigation • (3rd year) Initial tasks completion and integration of the results in a unified formal framework
Research Plan: directions • Main directions, investigate on: • Expressivity of Spatial Tree Logics (in particular for standard Types and Constraints specifications) • Decidability and complexity of model checking and validity for fragments (or extensions) of TQL logic • Reformulation (or generalization) of known results about reasoning and optimization over SSD • Other interesting directions: • Implementation of a query rewriter guided by constraints and types • Extensions to the logic to model order, data updates, private names
Background: Semi-structured data (SSD) • Semi - Structured Data (SSD) are used to: • model and query web (HTML, XML, …); • store sperimental data; • integrate eterogeneous databases; • … • SSD are: • Self-describing (structure is implicit); • Irregular; • Always in evolution