270 likes | 387 Views
TQL Algebra and its Implementation. Speaker : Giovanni Conforti Joint work with : Orlando Ferrara and Giorgio Ghelli. Università degli Studi di Pisa. IFIP TCS @ 2002 Montreal, 28th August . What I’m going to talk about…. Short introduction to SSD and SSD query languages.
E N D
TQL Algebra and its Implementation Speaker: Giovanni Conforti Joint work with: Orlando Ferrara and Giorgio Ghelli Università degli Studi di Pisa IFIP TCS @ 2002 Montreal, 28th August
What I’m going to talk about… Short introduction to SSD and SSD query languages. Tree logic and TQL overview. TQL Algebra motivations. TQL Algebra presentation. Translation algorithm. Translation correctness. Our implementation model. Conclusions and future works. IFIP TCS @ 2002 Montreal, 28th August
Semi-structured Data • Semi-Structured Data (SSD) are used to: • model and query web (HTML, XML, …); • store sperimental data; • integrate eterogeneous databases; • … • Semi-Structured Data (SSD) structure is: • irregular; • implicit; • always in evolution; • ......... IFIP TCS @ 2002 Montreal, 28th August
Data model: SSD as labelled trees (Example) articles[ article[ author[Cardelli] | author[Gordon] | title [Anywhere] | date[Apr, 2000] ] article[ author[Ghelli] | title[TQL] | conf[ETAPS] | date[ month[Feb] | year[2001] ] ] ] articles article article title date author author date author … … … … Ghelli year Cardelli Apr, 2000 month Gordon TQL 2001 Feb IFIP TCS @ 2002 Montreal, 28th August
SSD query languages As for tabular data we have SQL and relational algebra, we’d want to define query language and algebra for SSD Specify and develop a good query language for SSD (in paricular for XML) is one of the main current challenges of database and web research communities. After several proposals (Lorel, YATL, XMLQL, XDuce, etc.) the W3C has introduced the standard XQuery whose implementation and specification are work in progress. IFIP TCS @ 2002 Montreal, 28th August
TQL – the idea Extend the ambient logic to describe properties of SSD, obtaining a tree logic • The Tree logic is a modal logic good to express: • properties that regard horizontal and vertical structure of SSD • properties whose specification requires negation, recursion or universal quantification • constraint and types of SSD Introduce free variables inside tree logic formulas; use a pattern-matching approach to bind these variables to values inside a given data source new SSD query strategy: TQL IFIP TCS @ 2002 Montreal, 28th August
TQL – the language Fused in the binding operator: |= • Based on three clauses: • matching; • filtering; • reconstruction. The possibility of integrating logic expression and queries inside the same language gives several advantages in terms of expressivity and optimization (i.e. rewriting based on types) But this talk is not about TQL language, but about TQL Algebra… so i will introduce TQL aspects only needed to understand our work about the algebra. If you want to learn more about TQL see these two articles [WebDB2002] and [ETAPS2000] IFIP TCS @ 2002 Montreal, 28th August
Tree Logics - syntax A, B ::= T | A | AB | $x. A |$X. A | 0 |L[A] | A | B |L ~ L’ |X | x |m x. A Negation allows the definition of derived operators: FAÚB"x. A "X. A L[Þ A] A || B Path Expressions: • regular expressions; • compact way to express constraints on paths over trees; • can be defined using Tree Logics formulas. Es. .m.n[A] as m[ n[A] | T ] | T IFIP TCS @ 2002 Montreal, 28th August
Tree Logics – describing set of trees (forests) F |=s,d 0 iff F = 0 F |=s,d AB iff F |=s,d A e F |=s,d B F |=s,d m[A] iff F = m[F’] e F’ |=s,d A F |=s,d A | B iff $F’, F’’. F = F’ | F’’ e F’ |=s,d A e F’’ |=s,d B F |=s,d m[Þ A] iff " F’. F = m[F’] Þ F’ |=s,d A F |=s,d A || B iff " F’, F’’. F = F’ | F’’ Þ F’ |=s,d A o F’’ |=s,d B F |=s,d T always F |=s,d X iff F = s(X) F |=s,d Aiff ( F |=s,d A ) … … … IFIP TCS @ 2002 Montreal, 28th August
TQL Queries result[ article[ title [Anywhere] | date[Apr, 2000] ] | article[ title[TQL] | date[ month[Feb] | year[2001] ] ] ] $T $D {Anywhere} {Apr, 2000} {TQL} { month[Feb] | year[2001] } Syntax: Q, Q’ ::= 0 |X | L[Q] | f(Q) | Q | Q’|from Q|=A select Q’ Example: result[ from $articles |=articles[ article[title[$T] | date[$D] | T ] | T] select article[title[$T] | date[$D]]] IFIP TCS @ 2002 Montreal, 28th August
TQL Algebra motivations – in general TQL Rewriting TQL Algebra Rewriting Physical optimization In general an intermediate algebra assures: • transformability • executability Parser TQL query Transation Algebric expression Execution IFIP TCS @ 2002 Montreal, 28th August
TQL Algebra motivations – TQL case No current algebra for XML supports TQL operators (negation, quantification, horizontal navigation, etc.) => we write a new one. • Due to negation and derived operators, this algebra must support infinite bindings (variable bound to an infinite number of values). We want an algebra whose semantics is formally specified in order to prove its correctness w.r.t. TQL semantics. We want a running prototype, so we have to implement data structures and translation, evaluation algorithms for TQL Algebra IFIP TCS @ 2002 Montreal, 28th August
TQL Algebra – sorts and their semantics • It is an algebra of tables and trees, defined on four sorts. • label expressionsL : denoting labels; • tree expressions Q : denoting forests (set of trees); • row expressions RV: denoting rows over V (tuples with type V); • table expressions TV: denoting finite or infinite tables (set of rows) with schema V. The basic sort is the table one, that is used to represent the evaluation of a Q |=A TQL binding operation. SSD and TQL query results are naturally represent by tree expressions. IFIP TCS @ 2002 Montreal, 28th August
TQL Algebra – table expressions • One-row tables {RV} | {(x L )} | {(x Q )} • Relational operators (union, cartesian product, projection and restriction) TUV, V’ T | TV ,V’ T | ÕVT | L ~ L’T • Universe and Complement 1V | CoV (T ) • Vertical test and horizontal iterator of trees if Q = y[Y] then TY,y else T | U{Q=Y|Y’}T Y|Y’ • Recursion letrec M = Y. TM,Y in TM | M( Q ) IFIP TCS @ 2002 Montreal, 28th August
TQL Algebra – tree expressions Q ::= R(X) | Y | 0 | Q | Q’ | L[Q] | f(Q) | Parr T Qr Tree algebra reflects the TQL operators used to build trees (queries). The differences are • X does not denote a variable, but a name of a row; • we have a new metavariable Y ranging over tree variables; • the from-select clause is substituted by the tree construction (multiset union) Parr T Qr whose informal semantic is: “Compute the union of all Qr where r is a row belonging to T”. IFIP TCS @ 2002 Montreal, 28th August
TQL Algebra – derived table expressions • We can define by translation several useful table expressions: • intersection, junction, extension • co-projection (dual of projection) • other structural test on the tree • These operators are very useful for translate derived operators of the tree logic! • All of them are implemented in the current system. IFIP TCS @ 2002 Montreal, 28th August
Translation from TQL to TQL Algebra • The core of translation is the binder translation. We perform a semantic inversion transforming a formula (function from substitutions to set of trees) to a function that, given a tree returns a set of substitutions (table expression). AQ, RV, ╓ ╖ Translation is defined by structural recursion on A It actually depends from the current schema V, Q and R are only plugged somewhere inside the expression. is an environment mapping logical recursive variables to algebric ones. IFIP TCS @ 2002 Montreal, 28th August
Translation from TQL to TQL Algebra - example • Example: fromQ|=A Ù$x. x[$Z]selectQ’RV =ParrÎTQ’ RV ;r ╓ ╖ ╓ ╖ …… ╓ ╖ T = A Ù$x. x[$Z]Q ,RV, e Õ{$Z} ╓ ╖ A ╓ ╖ x[$Z] IFIP TCS @ 2002 Montreal, 28th August
Translation – operators IFIP TCS @ 2002 Montreal, 28th August
Translation correctness The formal approach we have taken allows us to prove the correctness of the translation. That is : Theorem FV(RV) dom(e) , FV(Q) V [[ Q ]] e(RV ) = Q RV e Semantics of the query Q in e(RV ) is equivalent to the semantics of the translation of Q in RV ╓ ╖ ╙ ╜ The core of the proof is the from-select case in which we prove the correctness of binder translation IFIP TCS @ 2002 Montreal, 28th August
Implementing the algebra – model description $X $Y { a } NotIn { a, b } { b } { a } Representing in a finite space possibly infinite tables. We use disjunctive constraints (closely related to proposals in constraint databases). For each algebric operator we define and implement the corresponding one that works on disjunctive constraints. New algorithms for complex operators (complement, co-projection, tree navigation) IFIP TCS @ 2002 Montreal, 28th August
Implementing the algebra – The TQL System …... Tql Applet Tql Applet • Implemented in Java and ported to C#. • Some stats: • ~20.000 LoC; • 182 classes. • Download at: http://tql.di.unipi.it/tql World Wide Web Tql Servlet DB Tql GUI Sys Interface File system Tql Engine XML World Wide Web IFIP TCS @ 2002 Montreal, 28th August
Conclusions • TQL Algebra: • realized as a tool for execute TQL; • seems to be quite general; • it is implemented (with some restictions); • deals with infinite tables. • Future works: • rewritings (with types and constraints); • static safety analysis; • cost model and physical optimizations; • extension to the graph model (graph logic). IFIP TCS @ 2002 Montreal, 28th August
The End The End. IFIP TCS @ 2002 Montreal, 28th August