200 likes | 321 Views
Query Languages. Aswin Yedlapalli. XML Query data model. Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence of nodes (eg. for sub elements). - an unordered set of nodes (eg. For attributes). Compatible with XML schemas.
E N D
Query Languages Aswin Yedlapalli
XML Query data model • Document is viewed as a labeled tree with nodes • Successors of node may be : - an ordered sequence of nodes (eg. for sub elements). - an unordered set of nodes (eg. For attributes). • Compatible with XML schemas
Comparison of XML and semi structured data • Similarities: - both are best described by a labeled graph. - both are schema-less self describing. • Differences: - XML is ordered; semi structured data is unordered. - XML can mix text and elements
Required features for a Query Language • Expressive power - The Query language must be at least as expressive as SQL on relational data. - The Query language should have the ability to restructure data. - The Query language should be able to navigate data with arbitrary nesting. • Semantics - It is very important in a query language for query transformation and optimization.
Compositionality - Our queries must remain in the same data model. They cannot take data in one model and produce output in another model. • Schema - when structure is defined, a query language should be exploited for optimization, type checking etc.,
Query languages • For semi structured data - Lorel (Lightweight Object REpository Language) - UnQL (Unstructured Query Language) -StruQL, MSL, W3QL, WebSQL, Weblog, etc., • For XML - XML-QL (XML Query Language) - XSLT & structural recursion. - XML Query Algebra.
Formal Semantics • Given query Q = SELECT E[X1,……. Xn] FROM F WHERE Cand database DB Answer: (Q,DB) is defined in two steps: –Step 1: compute all bindings: •Cij are node oids or atomic values
•Must satisfy paths in F •Must satisfy conditions in C –Step 2: answer is E[C11, …, C1n] È…È E[Cm1, …, Cmn]
When E has nested sub queries, apply semantics recursively • Note: so far we have dealt with an unordered model • –What do we need to do for order ? • •Complexity: PTIME in |DB| (not in |Q|).
LOREL • Minor syntactic differences in regular pathexpressions (% instead of _, # instead of _*) • Common path convention SELECT biblio.book.author FROM biblio.book WHERE biblio.book.year = 1999
Becomes SELECT X.author FROM biblio.book X WHERE X.year = 1999
Lorel • Query language of LORE system adapts OQL to semi structured data. Select X.title from bib.article X where “tova milo” in X.author returns {title: “type inf…”}
Features of Lorel • Differences with typed query languages - performs implicit coercions. - deals with missing attributes. - deals with set valued attributes. eg., x.year > 1998 may have several years. • Select clause creates new nodes. • Allows for nested queries. • Allows for regular path expressions.
UnQL (Unstructured Query language) • UnQL is an extension of basic LOREL. • UnQL does not make use of coercion unlike LOREL. • “Where” clause contains 2 kinds of constructs. - generators; variables are bound via patterns. - conditions; as in LOREL • “from” clause is not needed as variables are bound in patterns.
UnQL Queries • Eg., Select title:T where {bib:article:{title:T, year:Y}}in db, y>1998. • Root of the database is explicitly represented: db • UnQL queries can be rewritten in LOREL. The equivalent LOREL for the above query is: select title:T from bib.article A, A.title T, A.year Y where Y>1998.
Additional features of LOREL • Label variables - can combine “schema” and “data” information. - can turn tables to data and vice-versa. - perform group-by operations. • Can match variables with regular expressions.
References • Managing XML and semi structured data – Lecture series by Prof. Dan Suciu. • website:www.cs.washington.edu/homes/suciu/COURSES/590DS/