260 likes | 400 Views
Introduction to XQuery Bob DuCharme www.snee.com/bob bob@snee.com these slides: www.snee.com/xml. What is XQuery? .
E N D
Introduction to XQuery Bob DuCharme www.snee.com/bob bob@snee.com these slides: www.snee.com/xml
What is XQuery? “ A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources.” “ XQuery 1.0: An XML Query Language” W3C Working Draft
History February 1998: XML (Rec) November 1999: XSLT 1.0, Xpath 1.0 (Recs) (as of 8 June 2005): XPath 2.0, XSLT 2.0, XQuery 1.0 in “last call Working Draft” status Steps for a W3C “standard”: Working Draft Last Call Working Draft Candidate Recommendation Proposed Recommendation Recommendation
input1.xml sample document <doc> <p>This is a sample file.</p> <p>This line <emph>really</emph> has an inline element.</p> <p>This line doesn't.</p> <p>Do <emph>you</emph> like inline elements?</p> </doc>
Our first query Querying from the command line: java net.sf.saxon.Query " {doc('input1.xml')//p[emph]} " Result: <?xml version="1.0" encoding="UTF-8"?> <p>This line <emph>really</emph> has an inline element.</p> <p>Do <emph>you</emph> like inline elements?</p>
Query stored in a file xq1.xqy: (: Here is an XQuery comment. :) doc('data1.xml')//p[emph] Executing it: java net.sf.saxon.Query xq1.xqy
Simplifying the command line Linux shell script xquery : java net.sf.saxon.Query $1 $2 $3 $4 $5 $6 Windows batch file xquery.bat : java net.sf.saxon.Query %1 %2 %3 %4 %5 %6 (assuming saxon8.jar is in classpath) Executing either: xquery xq1.xqy
Data for more serious examples RecipeML: DTD and documentation http://www.formatdata.com/recipeml Squirrel's RecipeML Archive http://dsquirrel.tripod.com/recipeml/indexrecipes2.html My sample: 294 files
RecipeML: typical structure <recipeml version="0.5"> <recipe> <head> <title>Walnut Vinaigrette</title> <categories><cat>Dressings</cat></categories> <yield>1</yield> </head> <ingredients> <ing> <amt><qty>1</qty><unit>cup</unit></amt> <item>Canned No Salt Chicken</item></ing> <ing> <!-- more ing elements --> </ingredients> <directions> <step>Bring chicken broth to a boil.</step> <!-- more step elements --> </directions> </recipe> </recipeml>
Saxon and collection() function Argument to function names document in this format: <collection> <doc href="_Band__Sloppy_Joes.xml"/> <doc href="_Cheese__Fricadelle.xml"/> <!-- more doc elements... --> <doc href="Walton_Mountain_Coffee_Cake.xml"/> <doc href="Walty's_Dressing.xml"/> <doc href="Wan_Tan_(Wonton).xml"/> </collection>
Looking for some sugar collection('recipeml/docs.xml')/recipeml/ recipe/head/title [//ingredients/ing/item[contains(.,'sugar')]]
A more SQL-like approach for $ingredient in collection('recipeml/docs.xml')// ingredients/ing/item[contains(.,'sugar')] return $ingredient/../../../head/title
Outputting well-formed XML <sweets> { let $target := 'sugar' for $ingredient in collection('recipeml/docs.xml')// ingredients/ing/item[contains(., $target )] return $ingredient/../../../head/title } </sweets>
FLWOR expressions for let where order by return "a FLWOR expression ... supports iteration and binding of variables to intermediate results. This kind of expression is often useful for computing joins between two or more documents and for restructuring data."
Extracting subsets: XPath vs. FLWOR approach Get the title element for each recipe whose yield is greater than 20: collection('recipeml/docs.xml')/recipeml/ recipe/head/title[../yield > 20] Go through all the documents in the collection, and for any with a yield of more than 20, get the title: for $doc in collection('recipeml/docs.xml')/recipeml where $doc/recipe/head/yield > 20 return $doc/recipe/head/title
Doing more with the for clause variable (: Create an HTML page linking to recipes that serve more than 20 people. :) <html><head><title>Food for a Crowd</title></head> <body> <h1>Food for a Crowd</h1> { for $docin collection('recipeml/docs.xml') where $doc/recipeml/recipe/head/yield > 20 return <p><a href="{document-uri( $doc)}"> { $doc/recipeml/recipe/head/title/text()} </a></p> } </body></html>
Calling functions from a let clause (: Which recipe(s) serves the most people? :) let $maxYield := max(collection('recipeml/docs.xml')/recipeml/ recipe/head/yield) return collection('recipeml/docs.xml')/recipeml/ recipe[head/yield = $maxYield]
distinct-values andorder by (: A unique, sorted list of all unique ingredients in the recipe collection, with URLS to link to the recipes. :) <ingredients> { for $ingr in distinct-values(collection('recipeml/docs.xml')/ recipeml/recipe/ingredients/ing/item ) order by $ingr return <item name="{$ingr}"> { for $doc in collection('recipeml/docs.xml') where $doc/recipeml/recipe/ ingredients/ing/item = $ingr
distinct-values andorder by, continued return <title url="{document-uri($doc)}"> {$doc/recipeml/recipe/head/title/ text() } </title> } </item> } </ingredients>
"Gold Room" Scones</title> <title url="file:/c:/dat/recipeml/ _Outrageous_Chocolate_Chipper.xml"> "Outrageous" Chocolate-Oatmeal Chipper (Cooki</title> </item> <item name="Baking soda"> <title url="file:/c:/dat/recipeml/ _First__Ginger_Cookies.xml"> "First" Ginger Molasses Cookies</title> <title url="file:/c:/dat/recipeml/ _Foot_in_the_Cake.xml"> "Foot in the Fire" Chocolate Cake</title> </item> <item name="Tomato paste"> <title url="file:/C:/dat/recipeml/ Crawfish_Etouff'ee.xml"> "Frank's Place" Crawfish Etouff'ee </title> <title url="file:/C:/dat/recipeml/ Hamburger____Ground_Meat_Balti.xml"> "Hamburger" / Ground Meat Balti </title> <title url="file:/C:/dat/recipeml/ Indian_Chili_.xml"> "Indian Chili"</title> </item> <!-- some item elements removed --> </ingredients> Excerpt from output <ingredients> <!-- some item elements removed --> <item name=" (12-oz) tomato paste "> <title url="file:/C:/dat/recipeml/ _Best_Ever__Pizza_Sauce.xml"> "Best Ever" Pizza Sauce</title> </item> <item name=" Baking Powder"> <title url="file:/c:/dat/recipeml/ _Blondie__Brownies.xml"> "Blondie" Brownies</title> <title url="file:/c:/dat/recipeml/ Walnut_Pound_Cake.xml"> Walnut Pound Cake</title> </item> <item name=" Baking Soda "> <title url="file:/c:/dat/recipeml/ _Faux__Sourdough.xml"> "Faux" Sourdough</title> </item> <item name=" Baking potatoes "> <title url="file:/c:/dat/recipeml/ _Indian_Chili_.xml"> "Indian Chili"</title> </item> <item name=" Baking powder "> <title url="file:/c:/dat/recipeml/ _Best__Apple_Nut_Pudding.xml"> "Best" Apple Nut Pudding</title> <title url="file:/c:/dat/recipeml/ _Gold_Room__Scones.xml">
RecipeML: varying markup richness One way to do it: <ing><item> (12-oz) tomato paste </item></ing> Another way: <ing> <amt> <qty>12</qty> <unit>oz</unit> </amt> <item>tomato paste</item> </ing>
Normalizing data with declared functions (: A unique, sorted list of all unique ingredients in the recipe collection, with URLs to link to them. Ingredient names get normalized by functions declared in the query prolog. :) declare namespace sn = "http://www.snee.com/ns/misc/" ; declare function sn:normIngName($ingName) as xs:string { (: Normalize ingredient name. :) (: remove parenthesized expression that may begin string, e.g. in "(10 ozs) Rotel diced tomatoes":) let $normedName := replace($ingName,"^\(.*?\)\s*","") (: convert to all lower-case :) let $normedName := lower-case($normedName) (: replace multiple spaces with a single one :) let $normedName := normalize-space($normedName) return $normedName };
Normalizing data with functions, part 2 of 3 declare function sn:normIngList($ingList) as item()* { (: Normalize a list of ingredient names. :) for $ingName in $ingList return sn:normIngName($ingName) }; <ingredients> { let $normIngNames := sn:normIngList(collection('recipeml/docs.xml')// ing/item)
Normalizing data with functions, part 3 of 3 for $ingr in distinct-values($normIngNames) order by $ingr return <item name="{$ingr}"> { for $doc in collection('recipeml/docs.xml'), $i in $doc/recipeml/recipe/ingredients/ing/item where sn:normIngName($i) = $ingr return <title url="{document-uri($doc)}"> {$doc/recipeml/recipe/head/title/text()} </title> } </item> } </ingredients>
Specs at http://www.w3.org/tr XQuery 1.0: An XML Query Language XQuery 1.0 and XPath 2.0 Formal Semantics the XQuery 1.0 and XPath 2.0 Data Model XSLT 2.0 and XQuery 1.0 Serialization XQuery 1.0 and XPath 2.0 Functions and Operators XML Query Use Cases
Other resources eXist: http://www.exist-db.org http:ww/w3.org/TR: MarkLogic: http://www.marklogic.com Mike Kay “Comparing XSLT and XQuery”: http://idealliance.org/proceedings/xtech05/papers/02-03-01/ http:ww/w3.org/TR: XQuery Update Requirements XQuery 1.0 and XPath 2.0 Full-Text