1 / 28

On the Path to Efficient XML Queries

On the Path to Efficient XML Queries. Andrey Balmin, Kevin Beyer , Fatma Özcan IBM Almaden Research Center Matthias Nicola IBM Silicon Valley Lab. New languages = new abilities + new pitfalls. XQuery A new query language designed specifically for XML data SQL / XML

decker
Download Presentation

On the Path to Efficient XML Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Path to Efficient XML Queries Andrey Balmin, Kevin Beyer, Fatma Özcan IBM Almaden Research Center Matthias Nicola IBM Silicon Valley Lab

  2. New languages = new abilities + new pitfalls • XQuery • A new query language designed specifically for XML data • SQL / XML • Added XML as a data type, including XQuery sequences • Added XQuery as a sublanguage

  3. Purpose • Teach users of the new XML query languages • Teach the teachers • Share our users’ experiences • Influence languages

  4. Focus • Large databases with many moderate XML documents • Schema flexibility is required • Many schemas in one collection • No schema validation used • Schemas with xs:any • Documents like Atom Syndication and RSS that allow any extension • Therefore • Document filtering is primary concern • Limited type inference • Any data is possible

  5. Index eligibility Index Eligibility: We say that an index I is eligible to answer predicate P of query Q, if for any collection of XML documents D, the following holds: Q(D) = Q( I( P,D )). Where I( P,D ) is the set of XML documents produced, by probing index I with predicate P. This is not as obvious as it is in relational databases.

  6. XML indexes in DB2 • Index a linear XPath pattern over a column as a particular datatype CREATE INDEX index-name ON table(xml-column) USING 'pattern' AS type pattern ::= namespace-decls? (( / | // ) axis? ( name-test | kind-test ))+ axis ::= @ | child:: | attribute:: | self:: | descendant:: | descendant-or-self:: name-test ::= qname | * | ncname:* | *:ncname kind-test ::= node() | text() | comment() | processing-instruction() type ::= varchar | double | date | timestamp

  7. Query pattern  index pattern CREATE INDEX li_price ON orders(orddoc)USING XMLPATTERN '//lineitem/@price' AS double • Can use the index: more restrictive for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ]return $i • Cannot use the index: less restrictive for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@* > 100 ]return $i

  8. Match index and query predicate data type CREATE INDEX li_price ON orders(orddoc)USING XMLPATTERN '//lineitem/@price' AS double • Can use the index: numeric predicate and index for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ]return $i • Cannot use the index: string predicate for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[lineitem/@price > "100" ]return $i

  9. Data Types for Joins CREATE INDEX o_custid ON orders(orddoc) USING XMLPATTERN '//custid' AS double CREATE INDEX c_custid ON customer(cdoc) USING XMLPATTERN '/customer/id' AS double • Cannot use the indexes: unknown comparison type for $i in db2-fn:xmlcolumn("ORDERS.ORDDOC")/orderfor $j in db2-fn:xmlcolumn("CUSTOMER.CDOC")/customerwhere $i/custid = $j/idreturn $i • Can use the indexes: at least one cast required for $i in db2-fn:xmlcolumn("ORDERS.ORDDOC")/orderfor $j in db2-fn:xmlcolumn("CUSTOMER.CDOC")/customerwhere $i/custid/xs:double(.) = $j/id/xs:double(.)return $i

  10. SQL/XML Query Functions • XMLQuery Scalar function that returns an (possibly empty) XQuery sequence for every row • XMLExists Predicate that returns true iff the XQuery sequence produced is not empty • XMLTable Produces a table with one row for each item in the row-producing XQuery sequence, and with one column per column-producing XQuery expression. The columns may be XQuery sequences or cast to simple SQL types.

  11. Result XMLQuery does not filter rows (usually) • Cannot use the index: SELECT XMLQuery(‘ $order//lineitem[ @price > 100 ] ‘ passing orddoc as "order")FROM orders • Can use the index: VALUES (XMLQuery(’ db2-fn:xmlcolumn("ORDERS.ORDDOC") //lineitem[ @price > 100 ] ')) • Can use the index: db2-fn:xmlcolumn('ORDERS.ORDDOC') //lineitem[ @price > 100 ]

  12. XMLExists filter rows (usually) • Can use the index SELECT ordid, orddocFROM ordersWHERE XMLExists(‘ $order//lineitem[ @price > 100 ]‘ passing orddoc as "order") • Cannot use the index: false exists SELECT ordid, orddocFROM ordersWHERE XMLExists(‘ $order//lineitem/@price > 100‘ passing orddoc as "order") Need XMLTest which uses XQuery’s Effective Boolean Value

  13. XMLQuery + XMLExists vs. XMLTable • Can use the index SELECT ordid, XMLQuery(‘$order//lineitem[@price > 100] ’ passing orddoc as "order")FROM ordersWHERE XMLExists(‘$order//lineitem[@price > 100] ’ passing orddoc as "order") • XMLTable: More efficient and less redundant SELECT o.ordid, t.lineitemFROM orders o,XMLTable(‘$order//lineitem[@price > 100] ’ passing o.orddoc as "order“ COLUMNS "lineitem" XML BY REF PATH '.')as t(lineitem)

  14. Predicates in XMLTable column expressions • Can use the index SELECT o.ordid, t.lineitemFROM orders o,XMLTable(‘$order//lineitem[@price > 100] ’ passing o.orddoc as "order“ COLUMNS "lineitem" XML BY REF PATH '.')as t(lineitem) • Cannot use the index SELECT o.ordid, t.lineitem, t.priceFROM orders o, XMLTable(‘ $order//lineitem ’ passing o.orddoc as "order" COLUMNS "lineitem" XML BY REF PATH '.', "price" DECIMAL(6,3) PATH '@price[. > 100]‘ )as t(lineitem, price)

  15. Joining XML Values in SQL/XML • Can use index on product/id, but not p.id SELECT p.name, o.orddocFROM products p, orders oWHERE XMLExists(‘$order//lineitem/product[ id eq $pid ] ‘ passing o. orddoc as "order", p.id as "pid") • Can use index on p.id, but not product/id SELECT p.name, o.orddocFROM products p, orders oWHERE p.id = XMLCast( XMLQuery(‘ $order//lineitem/product/id ‘ passing o. orddoc as "order") as VARCHAR(13)) Need to unify XQuery and SQL data types

  16. Joining XML Values in SQL/XML • Probably cannot use XML indexes: SQL types differ from XML SELECT c.name, o.orddocFROM orders o, customer cWHERE XMLCast( XMLQuery(‘ $order/order/custid ’ passing o.orddoc as "order") as DOUBLE) =XMLCast( XMLQuery(‘ $cust/customer/id ’ passing c.cdoc as "cust") as DOUBLE) • Can use XML indexes SELECT c.name, o.orddocFROM orders o, customer cWHERE XMLExists(‘ $order/order[ custid/xs:double(.) = $cust/customer/id/xs:double(.) ] ‘ passing o.orddoc as "order", c.cdoc as "cust")

  17. XQuery Let Clauses • Can use the index for $doc in db2-fn:xmlcolumn('ORDERS.ORDDOC')for $item in $doc//lineitem[ @price > 100 ]return <result>{ $item }</result>for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderreturn $ord/lineitem[ @price > 100 ] • Cannot use the index for $doc in db2-fn:xmlcolumn('ORDERS.ORDDOC')let $item:= $doc//lineitem[ @price > 100 ]return <result>{ $item }</result>for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderreturn <result>{ $ord/lineitem[ @price > 100 ]}</result>

  18. XQuery Let Clauses • Can use the index for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderwhere $ord/lineitem/@price > 100return <result>{ $ord/lineitem }</result> • Same as above for $ord in db2-fn:xmlcolumn('ORDERS.ORDDOC')/orderlet $price := $ord/lineitem/@pricewhere $price > 100return <result>{ $ord/lineitem }</result>

  19. Context is everything • $i is bound to the document node for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') return $i/order/lineitem • $j is bound to <my_order> for $j in ( for $o in db2-fn:xmlcolumn('ORDERS.ORDDOC')/order return <my_order>{ $o/* }</my_order> )return $j/my_order/lineitem

  20. Remember the dot • Produces a type error: no document node at root let $order := <new_order>{ db2-fn:xmlcolumn('ORDERS.ORDDOC') /order[custid > 1001] } </new_order>return $order[ //customer/name ] • Absolute path expressions is a shorthand for fn:root(.) treat as document-node(). • Absolute path expressions are bad style

  21. Construction and View Composition • Want to rewrite this… let $view := for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC')/ order/lineitem return <ordered>{ $i/@quantity, $i/product/@price, <pid>{ $i/product/id/data(.) }</pid> }</ordered>for $j in $viewwhere $j/pid = '17‘return $j/@price

  22. Construction and View Composition • … into this for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') /order/lineitemwhere $i/product/id/data(.) = '17‘return $i/product/@price • but…

  23. Construction and View Composition • Data type changed to untypedAtomic • id is string: comparison is now an error • id is long: comparison is now as double instead of long • List types are concatenated • Error for duplicate @price attributes lost • New node identity lost • Parent axis is broken Any sequence should live in tree without change.Separate identity from construction?

  24. Remember the namespaces • Index definition and query must match namespaces CREATE INDEX li_price ON orders(orddoc)USING XMLPATTERN '//lineitem/@price' AS double • Cannot use the index. Which is right? declare default element namespace "http://ournamespaces.com/order";for $i in db2-fn:xmlcolumn('ORDERS.ORDDOC') //order[ lineitem/@price > 100 ]return $i

  25. Elements and text nodes differ CREATE INDEX PRICE_TEXT ON orders.orddocUSING XMLPATTERN '//price' AS varchar • Can not use index for $ord in db2-fn:xmlcolumn(“ORDERS.ORDDOC”) /order[ lineitem/price/text() = “99.50” ]return $ord • Element might have more data than just text <price>99.50<unit>USD</unit></price>

  26. Attributes are shy • No attributes //*//node() • Only attributes //@*//attribute::node() • Empty result due to “principle node kind” //@*/self:*

  27. Between predicates are not obvious • Might not be between: multiple prices lineitem[ price > 100 and price < 200 ] • Between or error lineitem[ price gt 100 and price lt 200 ] • Always between lineitem/price/data()[ . > 100 and . < 200 ] • Between if not list type lineitem/price[ . > 100 and . < 200 ] • Between if not list type lineitem[ @price > 100 and @price < 200 ]

  28. Conclusions • Easy to make mistakes without schema constraints • Many subtle differences in expressions • Improve construction composition • Unify SQL and XQuery type systems • Add XMLTest to SQL/XML

More Related