XML-QL A Query Language for XML

XML-QL A Query Language for XML Charuta Nakhe charuta@cse.iitb.ernet.in

Querying XML document • What is a query language? • Why not adapt SQL or OQL to query XML data? • What is an XML query? • What is the database? -- XML documents • What is input to the query? – XML document • What is the output of the query? – XML document

Requirements of XML query language • Query operations : • Selection: eg. Find books with “S. Sudarshan” as author • Extraction: eg. Extract the publisher field of above books • Restructuring : Restructuring of elements • Combination : Queries over more than one documents • Must be able to transform & create XML structures • Capability for querying even in absence of schema

The XML-QL language • The XML-QL language is designed with the following features: • it is declarative, like SQL. • it is relational complete, e.g. it can express joins. • it can be implemented with known database techniques. • it can extract data from existing XML documents andconstruct new XML documents. • XML-QL is implemented as a prototype and is freely available in a Java version.

Example XML document <bib> <book year=“1997”> <title>Inside COM</title> <author>Dale Rogerson</author> <publisher><name>Microsoft</name</publisher> </book> <book year=“1998”> <title>Database system concepts</title> <author>S. Sudarshan</author> <author>H. Korth</author> <publisher> <name>McGrawHill</name</publisher> </book> </bib>

Matching data using patterns Find those authors who have published books for McGraw Hill: WHERE <bib><book> <publisher><name>McGraw Hill</></> <title>$t</> <author>$a</> </book></bib> IN “bib.xml” CONSTRUCT <result><title>$t</><author>$a</></> • the $t and $a are variables that pick out contents. • the output is a collection of author names.

Result XML document <result> <title>Database system concepts</title> <author>S. Sudarshan</author> </result> <result> <title>Database system concepts</title> <author>H. Korth</author> </result>

Group results by book title : WHERE <bib.book>$p</> IN “bib.xml”, <title>$t</> <publisher><name>McGraw Hill</></> IN $p CONSTRUCT <result> <title>$t</> WHERE <author>$a</> IN $p CONSTRUCT $a </> Produces one result for each title and contains a list of all its authors Grouping with Nested Queries

Result XML document <result> <title>Database system concepts</title> <author>S. Sudarshan</author> <author>H. Korth</author> </result> . .

Constructing XML data • Results of a query can be wrapped in XML: WHERE <bib.book> <publisher><name>McGrawHill</></> <title>$t</> <author>$a</> </> IN “bib.xml” CONSTRUCT <result><author>$a</><title>$t</></> • Results are grouped in elements. • The pattern matches once for each author, which may give duplicates of books.

Joining elements by value Find all articles that have at least one author who has also written a book since 1995 : WHERE <bib.article> <author>$n</> I </> CONTENT_AS $a IN “bib.xml”, <book year=$y> <author>$n</> </> IN “bib.xml”, y > 1995 CONSTRUCT <article>$a</> • CONTENT_AS $a following a pattern binds the content of the matching element to the variable $a

Tag variables Find all publications in 1995 where Smith is either an author or editor : WHERE <bib.$p> <title>$t</> <year>1995</> <$e>Smith</> </> IN “bib.xml”, $e IN {author, editor} CONSTRUCT <$p><title>$t</><$e>Smith</></> • $p matches book and article. • $e matches author and editor.

Regular-path expressions Find the name of every part element that contains a brand element equal to “Ford”, regardless of the nesting level at which r occurs. WHERE <part*> <title>$r</> <name>Ford</> </> IN “bib.xml” CONSTRUCT <result>$r</> • Regular path expressions can specify element paths of arbitrary depth

Other interesting features • Constructing explicit root element • Grouping of data • Transforming XML data • Integrating data from different XML sources

Links for more information • www.w3.org/TR/NOTE-xml-ql : The XML-QL W3C Note • www.research.att.com/~mff/xmlql/doc : The XML-QL home page • www.w3.org/XML/Activity.html#query-wg : The XML Query Working Group • www.w3.org/TR/xmlquery-req : XML Query Requirements (W3C Working Draft) • www.oasis-open.org/cover/xmlQuery.html : Robin Cover's page on XML query languages

Example DTD <!ELEMENT book (author+, title, publisher)> <!ATTLIST book year CDATA> <!ELEMENT article (author+, title, year?)> <!ATTLIST article type CDATA> <!ELEMENT publisher (name, address)> <!ELEMENT author (firstname?, lastname)>

Creating an explicit root element • Every XML document must have a single root. XML-QL supplies an <XML> element as default, but others may be specified: CONSTRUCT <results> { WHERE <bib.book> <publisher><name>McGrawHill</></> <title> $t </> <author> $a </> </> IN “bib.xml” CONSTRUCT <result><author>$a</><title>$t</></> } </results>

XML-QL A Query Language for XML