1 / 44

XTree for Declarative XML Querying

XTree for Declarative XML Querying. Zhuo Chen, Tok Wang Ling, Mengchi Liu, and Gillian Dobbie January 2004. Outlines. Introduction Preliminaries XTree Algorithm to transform XTree query to XQuery Conclusion and future works. Outlines. Introduction Preliminaries XTree

tarika
Download Presentation

XTree for Declarative XML Querying

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XTree for Declarative XML Querying Zhuo Chen, Tok Wang Ling, Mengchi Liu, and Gillian Dobbie January 2004

  2. Outlines • Introduction • Preliminaries • XTree • Algorithm to transform XTree query to XQuery • Conclusion and future works

  3. Outlines • Introduction • Preliminaries • XTree • Algorithm to transform XTree query to XQuery • Conclusion and future works

  4. Introduction • How to query XML documents is an important issue in XML research • Various query languages proposed: • XPath, XQuery, Lorel, XML-GL, XQL, XML-QL, XSLT, YATL, XDuce, a rule-based semantic querying, a declarative XML querying, etc • XQuery based on XPath is selected as the basis for an official W3C query language for XML

  5. Introduction • In this paper, we will • Analyze the limitations of XPath • Propose a new set of syntax rules called XTree, which is a generalization of XPath • Show how XTree can efficiently replace the notations of XPath • Give algorithms to convert queries based on XTree expressions to standard XQuery queries

  6. Outlines • Introduction • Preliminaries • Background on XPath • Limitations of XPath • XTree • Algorithm to transform XTree query to XQuery • Conclusion and future works

  7. Preliminaries • XPath • A W3C standard • A set of syntax rules for defining parts of an XML document • It uses paths to identify nodes (elements and attributes) in XML documents • These path expressions look very much like computer file system

  8. Background on XPath <bib name=“IT”> <book id=“b001” year=“1994”> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id =“b002” year=“1992”> <title>Advanced Programming in the Unix Environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id=“b003” year=“2000”> <title>Data on the Web</title> <edition>3</edition> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann</publisher> </book> <journal id=“j001” year=“1998”> <title>XML</title> <editor><last>Date</last><first>C.</first></editor> <editor><last>Gerbarg</last><first>M.</first></editor> <publisher>Morgan Kaufmann</publisher> </journal> </bib> • Sample XML document of a bibliography

  9. Background on XPath • XPath examples • /bib/book/@year • Get attribute “year” of each book • /bib/book/author • Get element “author” of each book • //author • Get all elements named “author”, regardless of their absolute paths • /bib/book/* • Get all sub-elements of each book • /bib/book/@* • Get all attributes of each book • /bib/book[2] • Get the second book element • /bib/book[last()] • Get the last book element

  10. Background on XQuery • XQuery • An XML querying language to search XML documents • Based on XPath • FLWOR statements • For – Let – Where – Order by – Return • For clause iterate the variable over the result of its expression • Let clause bind the variable to the result of its expression • Complex queries (nested clauses) • Complex result constructions • User-defined functions

  11. Background on XQuery • XQuery example • List year an title of all books published after 1995 XQuery: for $book in /bib/book where $book/@year > 1995 return <book> { $book/@year } { $book/title } </book> Result: <book year=“2000”> <title>Data on the Web</title> </book>

  12. Limitations of XPath • XPath has some limitations: 1. We can only assign one variable for each XPath expression • It is just a linear path, which is not like the XML’s tree structure • Inefficient • If a query needs to get values from several places, it has to use several paths 2. It is difficult to reveal the relationship among correlated XPaths • This may cause mistakes if a user does not pay attention when writing a query • Eg, if we want to output title and author of each book XPath 1: /bib/book/title, XPath 2: /bib/book/author Wrong! The above two paths are not correlated

  13. Limitations of XPath • XPath has some limitations: 3. XPath is inefficient to express query that returns elements at path A while the condition is in a distant path B • Difficult to distinguish condition branch from target branch • Especially for multiple conditions and nested conditions • Eg, find the value of publisher id of a book which has an author with last name as “Stevens” and first name as “W.” /bib/book/author[last=“Stevens” and first=“W.”]/../publisher/@pubid 4. XPath expressions are only used in the querying part of XQuery, not in the result construction part • In XQuery, the result construction part mixes literal text, variable evaluation and even nested sub-queries • The whole query is difficult to read and comprehend

  14. Limitations of XPath • XPath has some limitations: 5. XPath can only bind variable on the whole node (element or attribute) structure, which is a name-value pair • If we want to get the substructure of the node, we have to invoke built-in functions • local-name() to get node name • string() to get string value • Difficult to query XML documents with unknown structure, or to rename the nodes in the result for $book in /bib/book let $attrib := $book/@* return <book> { $book/text(), $book/* } <attribute name={ local-name($attrib)} value={ string($attrib) }/> </book> • Eg, Suppose we do not know the sub-structure of book element, we want to re-structure books in this way: keep text nodes and sub-elements unchanged, but convert attributes to sub-elements:

  15. Outlines • Introduction • Preliminaries • XTree • Basic syntax • XTree for querying • XTree for result construction • Algorithm to transform XTree query to XQuery • Conclusion and future works

  16. XTree • XTree is a generalization of XPath • XTree has a tree structure like XML • XTree is more efficient than XPath • In the querying part, one XTree expression can bind multiple variables • In XQuery, one XPath expression can only bind one variable • In the result construction part, one XTree expression can be used to define the result format • Avoid nested structure in the query • Make the whole query easier to read and understand • Supports list-valued variables explicitly, and determines their values uniquely

  17. XTree syntax • Similar to that of XPath • / means parent-child hierarchy • // means no matter how many levels down (ancestor-descent) • ( ) in front to indicate the URL of the document • Sibling tree nodes are enclosed by { }, and separated by commas • { } can be nested • In XTree, conditions are written directly without { } • Use logic variables as place holders to bind/match the values at their places • → to assign variables in the querying part • ← to get values from variables in the result construction part • Only interested sub-trees are written in XTree, not the whole XML tree structure

  18. XTree for querying • Symbol → will assign values of nodes on the left side to the variable on the right side • Example. For the sample bibliography document, suppose we want to get the year and title of each book, and its authors’ last names and first names • We can use the variables $y, $t, $first, $last to bind them respectively as in the following XTree expression: • /bib/book/{@year→$y, title→$t, author/{last→$last, first→$first}} • We can instantiate many variables in one XTree expression • The above XTree expression corresponds to the following 6 XPath expressions in XQuery: for $book in /bib/book, $y in $book/@year, $t in $book/title, $author in $book/author, $last in $author/last, $first in $author/first

  19. XTree for querying • Example. Suppose we want to get the last name and first name elements at whatever depth in the document, we can write the following XTree expression: /bib//{last→$last, first→$first} • The square braces enclosing two elements last and first specifies that these two elements are sibling. • According to the XML document, the parent of sibling elements last and first is /bib/book/author or /bib/journal/editor • XTree allows a user to use path abbreviation as in XPath

  20. XTree for querying • Example. Suppose we want to obtain some attribute with value “2000” in some book element, and bind variable $b to that book: /bib/book→$b/@$attr=“2000” • According to the sample document, $b will bind to the third book, and $attr will bind to the attribute name “year”. • XTree allows a user to bind variables on the structure of XML document • A user can assign variable $var on the left side of → symbol • Here $var will bind to the name of the corresponding node

  21. XTree • Two types of variables • Single-valued variables • $X • An element instance of the specified path • List-valued variables • {$X} • A list of all $X instances • Explicitly indicated by a pair of curly braces • Note that both sibling nodes and list-valued variables are enclosed by curly braces • Sibling nodes will have commas as separators in the braces • List-valued variables does not have commas in the braces

  22. List-valued variables • Object-oriented functions of list-valued variables: • Aggregate functions Suppose list-valued variable {$nums} binds to a list of numbers • {$nums}.count() returns the number of items in the list • {$nums}.avg() returns the average value of items in the list • {$nums}.min() returns the minimum value in the list • {$nums}.max() returns the maximum value in the list • {$nums}.sum() returns the sum of values in the list

  23. List-valued variables • Object-oriented functions of list-valued variables: • List operations Suppose list-valued variable {$names} binds to a list of name elements • {$names}.[1-3, 6] returns a sublist of 1st to 3rd items, and 6th item • {$names}.last() returns the last item in the list • {$names}.sort() sorts the items in the list in ascending order • {$names}.sort_desc() sorts the items in the list in descending order • {$names}.distinct() eliminates duplicate items in the list • {$names}.random(3) picks out 3 items randomly • $name  {$names} check whether an item is in the list • {$names’}  {$names} check whether the first list is a sub-list of the second list

  24. Semantics of list-valued variables • Definition 1. The associated path of variable $a (or {$a}) is the absolute path expression from root to the nodes represented by $a (or {$a}). • /bib/book→$b/title→$t • the associated path of $t is /bib/book/title. • Definition 2. Variable $a is an ancestor variable of $b if $a and $b are defined in the same XTree expression, and the associated path of $a is a prefix of the associated path of $b. • /bib/book→$b/{title→$t, author→$a} • $b is an ancestor variable of $t and $a, but $t is not an ancestor variable of $a.

  25. Semantics of list-valued variables • Definition 3. In an XTree expression, when a variable is bound to a value in the query evaluation, the variable is instantiated. • /bib/book/{author→$a/first→$first, title→$t} • In the evaluation, when we have reach /bib/book/author, $a is instantiated; when reach /bib/book/author/first, $first is instantiated. • Definition 4. The value of list-valued variable {$a} is a list of all instances of $a with all its ancestor variables instantiated. • /bib/book/author→{$a} {$a} means all the author elements of all the books • /bib/book→$b/author→{$a} {$a} means all the authors of a certain book $b value of {$a} value of {$a}

  26. XTree for result construction • XTree expression can also be used to define the result format • Symbol ← will get values of variables from right side and assign them to the expression on the left side • The result construction part is just one XTree expression • No nested structure as the return clause of XQuery • Since XTree already has a tree structure • Easy to read and understand • Must be concrete • No condition checking or uncertainty in the structure • Unlike XTree expressions in the querying part

  27. XTree for result construction • Example. We want to list the titles and publishers of books which are published after 1993, suppose we have bound the variables by the following XTree expression: /bib/book/{@year>1993, title→$t, publisher→$p} We can write the following XTree expression to define the result format: /result/recentbook/{title←$t, publisher←$p} • The result format is defined as: under the root result, each recentbook element will store the title and publisher of that book <result> <recentbook> <title>TCP/IP Illustrated</title> <publisher>Addison-Wesley</publisher> </recentbook> <recentbook> <title>Data on the web</title> <publisher>Morgan Kaufmann</publisher> </recentbook> <result>

  28. XTree for result construction • Example. For each book, show the title, the number of authors and the first author, suppose the variable bindings are defined in the following XTree expression: /bib/book/{title→$t, author→{$a}} We can write the following XTree expression to return the result: /result/book/{title←$t, authNum←{$a}.count(), author←{$a}[1]} • {$a}.count() counts the number of items in the {$a} list • {$a}[1] returns the first item in the {$a} list • Output: <result> <book> <title>TCP/IP Illustrated</title> <authNum>1</authNum> <author><last>Stevens</last><first>W.</first></author> </book> <book> <title>Advanced Programming in the Unix Environment</title> <authNum>1</authNum> <author><last>Stevens</last><first>W.</first></author> </book> <book> <title>Data on the Web</title>> <authNum>3</authNum> <author><last>Abiteboul</last><first>Serge</first></author> </book> </result>

  29. XTree for result construction • The right side of ← symbol can be: • A pre-defined variable or invocation of functions on variables • Literal text, indicating static content • Omitted, indicating an empty value • Example. Suppose we want to return a book whose title is “Computer Architecture”, and which does not have a specified author, we can write the following XTree expression: /bib/book/{title←“Computer Architecture”, no-author} It will output the following XML segment: <bib> <book> <title>Computer Architecture</title> <no-author/> </book> </bib>

  30. XTree for result construction • Query based on XTree expressions has QWOC (Query-Where-Order by-Construct) statements • Query clause contains one or more XTree expressions for selection and variables binding • Where clause is optional, it defines constraints • Order by clause is optional, it defines the ordering • Construct clause contains one XTree expression to define the output format

  31. Outlines • Introduction • Preliminaries • XTree • Algorithm to transform XTree query to XQuery • An algorithm to transform an XTree expression in the query part to a set of XPath expressions • An algorithm to transform an XTree expression in the result construction part to some nested XQuery expressions • Conclusion and future works

  32. Transformation algorithm for querying part • Transform an XTree expression in the querying part to a set of XPath expressions • Not as trivial as just extracting each path associated with a variable to be an XTree expression • Variables may correlate to each other by some common ancestors • We have to use such common ancestors to constrain the descendent variables • The common ancestors we want are just those branching nodes (the nodes just before every pair of square braces for branching) • Use stack to store such common ancestors for later use

  33. Transformation algorithm for querying part • Process the XTree expression from left to right, for each common ancestor of variables (except the root), assign a single-valued variable on it if it is not originally bound to a variable • Translate each single-valued variable to be an XPath expression in a for clause; translate each list-valued variable to be an XPath expression in a let clause • Try to write the path expression of a variable to be the relative path of its nearest ancestor variable (make use of the stack) • If it has such ancestor variable, then write its path expression to be the relative path from that ancestor variable • If it does not have any ancestor variable, then write its path expression to be the absolute path from the root • The output paths will be in depth-first order of the XTree

  34. Transformation algorithm for querying part • Example: /bib/{book→$b/{title→$t,author→{$a}}, journal→$j/{title→$jt,editor→$e/{last→$last, first→$first}}} /bib/{book→$b/{title→$t,author→{$a}}, journal→$j/{title→$jt, editor/{last→$last, first→$first}}} /bib/{book→$b/{title→$t,author→{$a}}, journal/{title→$jt, editor/{last→$last, first→$first}}} /bib/{book→$b/{title→$t, author→{$a}}, journal/{title→$jt, editor/{last→$last, first→$first}}} /bib/{book→$b/{title→$t,author→{$a}}, journal→$j/{title→$jt, editor/{last→$last, first→$first}}} /bib/{book/{title→$t, author→{$a}}, journal/{title→$jt, editor/{last→$last, first→$first}}} /bib/{book→$b/{title→$t,author→{$a}}, journal→$j/{title→$jt,editor→$e/{last→$last, first→$first}}} /bib/{book→$b/{title→$t,author→{$a}}, journal→$j/{title→$jt,editor→$e/{last→$last, first→$first}}} /bib/{book→$b/{title→$t, author→{$a}}, journal/{title→$jt, editor/{last→$last, first→$first}}} • XPaths generated: for $b in /bib/book for $t in $b/title let $a := $b/author for $j in /bib/journal for $jt in $j/title for $e in $j/editor for $last in $e/last for $first in $e/first

  35. Transformation algorithm for result construction part • Transform an XTree expression in the result construction part to some XQuery expressions • More complicated • We will often encounter nested sub-queries in XQuery • Consider the case that the node name to get the variable value is different from the node name where the variable was bound in the querying part • Process the XTree expression step by step • Find the corresponding XPath expression of each variable in the XPaths generated from last algorithm • Translate each variable value substitution to some XQuery statement • Use curly braces { } to form sub-query blocks according to the structure of the XTree expression in construct clause

  36. Transformation algorithm for result construction part • Example: query /bib/{book/{title→$t, author→{$a}}, journal/{title→$jt, editor/{last→$last, first→$first}}} construct /result/{book/{name←$t, authors/{@count←{$a}.count( ), au←{$a}}}, journal/{title←$jt, editor/{first←$first, last←$last}}} • Generated XPath expressions of the querying part: for $b in /bib/book for $t in $b/title let $a := $b/author for $j in /bib/journal for $jt in $j/title for $e in $j/editor for $last in $e/last for $first in $e/first

  37. <result> { for $b in /bib/book return <book> { for $t in $b/title return <name> {$t/*} {$t/@*} {$t/text()} </name> } { let $a := $b/author return <authors count={count($a)}> { for $x in $a return <au> {$x/*} {$x/@*} {$x/text()} </au> } </authors> } </book> } { for $j in /bib/journal return <journal> { for $jt in $j/title return {$jt} } { for $e in $j/editor return <editor> { for $first in $e/first return {$first} } { for $last in $e/last return {$last} } </editor> } </journal> } </result> Transformation algorithm for result construction part • Output:

  38. Outlines • Introduction • Preliminaries • XTree • Algorithm to transform XTree query to XQuery • Conclusion and future works • Conclusion • Future works

  39. Conclusion • Discussed the limitations of XPath • Proposed a new set of syntax rules called XTree • XTree has a tree structure • In the querying part, one XTree expression can bind multiple variables • In the result construction part, one XTree expression can define the result format • List-valued variables are explicitly indicated, and their values are uniquely determined • XTree is more compact and convenient to use than XPath • Designed algorithms to transform a query based on XTree expressions to a standard XQuery query

  40. Future works • Implement an XTree query parser • Queries based on XTree expressions can be executed directly • The query evaluation will be more efficient on this approach, since we will have a global view of the whole query tree • Extend the transformation algorithms to support queries with join, negation, grouping and recursion • Optimize the output XQuery queries of our transformation algorithms according to the schema of the XML document • Observe the progressive development of XPath to continuously enhance our XTree

  41. References • S.Abiteboul, D.Quass, J.McHugh, J.Widom, and J.L. Wiener. The Lorel Query Language for Semistructured Data. International Journal of Digital Library 1(1):68-99, 1997. • S.Ceri, S.Comai, E.Damiani, P.Fraternali, S.Paraboschi, and L.Tanca. XML-GL: a Graphical Language for Querying and Restructuring WWW data. In Proceedings of the 8th International World Wide Web Conference, Toronto, Canada, 1999. • S.Cluet and J.Simeon. YATL: a Functional and Declarative Language for XML. Draft manuscript, March 2000. • H.Hosoya and B.Pierce. XDuce: A Typed XML Processing Language (Preliminary Report). In Proceedings of WebDB Workshop, 2000. • M.Liu and T.W.Ling. Towards Declarative XML Querying. In Proceedings of WISE 2002, 127-138, Singapore, 2002. • P.Chippimolchai, V.Wuwongse and C.Anutariya. Semantic Query Formulation and Evaluation for XML Databases. In Proceedings of WISE 2002, 205-214, Singapore, 2002. • D.Chamberlin, P. Fankhauser, M.Marchiori, and J.Robie. XML Query Requirements. W3C Working Draft, In http://www.w3.org/TR/xquery-requirements/, June 2003. • J. Clark and S.DeRose. XML Path Language (XPath) Version 1.0. W3C Recommendation, In http://www.w3.org/TR/xpath, November 2001. • D.Chamberlin, D.Florescu, J.Robie, J.Simon, and M.Stefanescu. XQuery 1.0: A Query Language for XML. W3C Working Draft, In http://www.w3.org/TR/xquery/, May 2003. • J.Robie, J.Lapp, and D.Schach. XML Query Language (XQL). In • http://www.w3.org/TandS/QL/QL98/pp/xql.html, 1998. • A. Deutsch, M.Fernandez, D.Florescu, A.Levy, and D.Suciu. XML-QL: A Query Language for XML. In http://www.w3.org/TR/NOTE-xml-ql/, August 1998. • J.Clark. XSL Transformations (XSLT) Version 1.0. W3C Recommendation, In http://www.w3.org/TR/xslt, November 1999.

  42. Thank you

  43. <bib name=“IT”> <book id=“b001” year=“1994”> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id =“b002” year=“1992”> <title>Advanced Programming in the Unix Environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id=“b003” year=“2000”> <title>Data on the Web</title> <edition>3</edition> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann</publisher> </book> <journal id=“j001” year=“1998”> <title>XML</title> <editor><last>Date</last><first>C.</first></editor> <editor><last>Gerbarg</last><first>M.</first></editor> <publisher>Morgan Kaufmann</publisher> </journal> </bib> {$a} back

  44. <bib name=“IT”> <book id=“b001” year=“1994”> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id =“b002” year=“1992”> <title>Advanced Programming in the Unix Environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> </book> <book id=“b003” year=“2000”> <title>Data on the Web</title> <edition>3</edition> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann</publisher> </book> <journal id=“j001” year=“1998”> <title>XML</title> <editor><last>Date</last><first>C.</first></editor> <editor><last>Gerbarg</last><first>M.</first></editor> <publisher>Morgan Kaufmann</publisher> </journal> </bib> $b {$a} {$a} $b {$a} $b back

More Related