390 likes | 605 Views
Query Processing with XML. CSE 350 – Advanced Database Topics Jeffrey R. Ellis. Query Processing Topics. Why? Java and Other Programming Languages XPath/XSLT XQuery (W3C-sponsored Query Language) Current Research Other Query Languages XISS (XML Indexing and Storage System).
E N D
Query Processing with XML CSE 350 – Advanced Database Topics Jeffrey R. Ellis
Query Processing Topics • Why? • Java and Other Programming Languages • XPath/XSLT • XQuery (W3C-sponsored Query Language) • Current Research • Other Query Languages • XISS (XML Indexing and Storage System)
FIRST – Distinction between XML and HTML/Web Technologies • XML spotlight is analogous to Java • Immediate benefits applied to World Wide Web • Long-range, more exciting benefits in applications • XML IS NOT AN HTML REPLACEMENT • HTML marks pages up for presentation on the web • XML marks text for semantic information purposes • XML can encode HTML pages, but HTML works well on the Web
XML Data Storage • XML Documents • Data is delineated semantically • Schemas/DTDs control contents of elements • Semi-structured attitude allows flexibility • Text is human-readable and machine-parsable • Open standards work with common tools • File data storage allows for easy sharing • Can queries control access to data?
Traditional Database Storage • Databases • Data is delineated semantically • Schemas control contents of rows • No flexibility from semi-structured storage • Data is not human-readable, but only machine-parsable • Proprietary standards prevent interoperability • Proprietary storage prevents data sharing • Queries control access to data
XML for Query Processing • If we can get efficient query processing, XML document storage provides many benefits over traditional database storage. • Sample application • Employee database document • XML Schema assumed to exist • Employee information queried as per standard HR processing
<?xml version="1.0"?> <!DOCTYPE employees SYSTEM "employee.xsd"> <employees> <emp gender='m'> <name> <last>Bissell</last> <first>Brian</first> </name> <position>IT Specialist</position> <salary>35,000</salary> <location>CT</location> </emp> <emp gender='m'> <name> <last>Pham</last> <first>Hung</first> <mi>Q</mi> </name> <position>Senior IT Specialist</position> <salary>45,000</salary> <location>CT</location> </emp> … </employees>
Tree Structure of XML Document • Remember that XML documents are trees emp gender name position salary location last first mi
Query Processing – Programming Languages • XML Documents are flat files • Any language with file I/O can read XML document • Any language with string parsing capabilities can use XML data • Query processing done through language syntax • “Obvious” result different from traditional databases
Query Processing – Programming Languages • Strategy • Basic File I/O through language • Basic String matching to identify elements • Processing possible, but not necessarily efficient • Languages have gathered XML processing tools in libraries • xerces – Apache library for Java and C++ • Two methods for parsing XML data • DOM • SAX
DOM • Document Object Model • Defined by W3C for XML, HTML, and stylesheets • Provides an hierarchical, object-view of the document • DOMParser parses through file, then provides access to nodes • Key: Every item in XML document is a node
DOM Example Node (Element) name=“emp” attribute1 child1 Node (Attr) name=“gender” value=“m” parent Node (Element) name=“name” parent child1 Node (Element) name=“last” parent child1 Node (Text) value=“Bissell” parent
SAX • Simple API for XML • Defined by XML-DEV mailing list • Provides an event-driven processing of the document • XMLReader parses through file and activates different methods and functions based on the elements retrieved • Key: Methods are defined in interface, implemented in user code
DOM versus SAX • SAX is primarily Java-based; DOM defined for most languages • DOM requires storage of entire document in memory; SAX processes as it reads • DOM mirrors a document that can be revisited; suited for document processing • SAX mirrors object lifecycles; suited for data processing
Query Processing - XPath/XSLT • Standard XML technologies XPath and XSLT provide a ready-made querying infrastructure • XPath identifies the location of various document elements • XSL Stylesheets provide methods for tranforming data from one format to another • Combining XPath and XSLT provides easy generation of result sets based on queries
XPath • Provides element, value, and attribute identification employees/emp/name/first = “Brian”, “Hung”, “Sara”, “Brian” //salary = “35,000”, “40,000”, “35,000”, “60,000” count(/employees/emp) = 4 //mi = “Q”
XSLT • Stylesheet transforms data from one form into another <xsl:template match=“name”> <xsl:value-of select=“first”/> <xsl:value-of select=“last”/> </xsl:template> = Brian Bissell, Hung Pham, Sara Menillo, Brian Chicos
Combine XPath and XSLT for Queries • Query: Find the last name and position of each employee named Brian <xsl:template match='employees'> <xsl:for-each select='emp'> <xsl:if test='name/first="Brian"'> <xsl:value-of select='name/last'/> <xsl:text>:</xsl:text> <xsl:value-of select='position'/> <xsl:text>; </xsl:text> </xsl:if> </xsl:for-each> </xsl:template>
Combine XPath and XSLT for Queries • Query: Find the average salary of all non-managers <xsl:template match='employees'> <xsl:variable name='running_sum'> <xsl:value-of select='sum(emp/salary[../position!="Manager"])'/> </xsl:variable> <xsl:variable name='running_count'> <xsl:value-of select='count(emp[position!="Manager"])'/> </xsl:variable> <xsl:value-of select='$running_sum div $running_count'/> </xsl:template>
Results XSLT/XPath • Many SQL queries can be accomplished • XPath provides element (data) access • XPath provides basic functions (e.g., sum() ) • XPath provides WHERE functionality • XSLT provides SELECT functionality • XSLT provides ORDER BY functionality (sort) • XSLT provides result set formatting • UNION functionality provided ..?
Querying with XPath and XSLT • Important questions • Is it sufficient? • Is it efficient? • Is there a better way? • XML community has need to design a full query language • XQuery – Working draft published 7 June 2001
Query Processing - XQuery • XML provides flexibility in representing many kinds of information • Good query language must be likewise flexible • Pre-XQuery languages are good for specific types of data • Goal: “[S]mall, easily implementable language in which queries are concise and easily understood.”
XQuery Forms • Path expressions • Element constructors • FLWR expressions • Operator/Function expressions • Conditional expressions • Quantified expressions • Data Type expressions
XQuery – Path Expressions • Contribution of XPath • XQuery 1.0 and XPath 2.0 Data Model document(“sample1.xml”)//emp/salary /employees/emp/name[../@gender=‘f’] //emp[1 TO 3]/name/first
XQuery – Element Constructors • Queries can generate new elements • Similar to XSLT abilities <worker> {$name/last} {$position} </worker>
XQuery – FLWR Expressions • For clause/Let clause/Where clause/Return • Similar to SQL FOR $e IN document(“sample1.xml”)//emp WHERE $e/salary > 38000 AND $e/@gender = ‘f’ RETURN $e/name
XQuery – Operator/Function Expressions • Pre-defined and user-defined operators and functions • Still under development: Union, Intersect, Except FOR $e IN //employees/emp WHERE not(empty($e//mi)) RETURN $e/name
XQuery – Conditional Expressions • If-then-else expressions are not yet limited to boolean (ongoing discussion) FOR $e IN /employees/emp RETURN <worker> {$name} IF ($e/position=“Manager”) THEN <manager /> </worker>
Quanitifed Expressions • Some/Every conditions • Some/Every evaluates to True or False FOR $e IN //employees WHERE SOME $p IN $e//emp/position = “Manager” RETURN $e
Data Types • Data Types based on those available from XML Schema • Data types can be literal (“Brian”), from constructor functions (date(“2001-10-11”) ), or from casting ( CAST AS xsd:integer(24) ) • User-defined data types are also allowable and parsable
XQuery • More choices than XSLT/XPath combination • Work in progress • Current W3C efforts into query language • Influencing the future design of the core XML technologies (XPath) • Hopes to be fully flexible for all future XML applications
Query Processing – Research • XQuery specification continues to undergo review and change • 6 of 7 specification documents released since June • All specifications released in 2001 • Other avenues of research • Other Query languages • Indexing strategies • Implementation
Query Processing – Other Query Languages • Many query languages exist • Quilt (basis for XQuery) • W3C early languages (XML-QL, XQL) • Adopted traditional languages (OQL, XSQL) • Research papers (XML-GL, YATL, Lorel) • Other query languages often optimized for a particular subset of XML documents • Query language field *MAY* be standardizing to XQuery
Query Processing – Indexing Strategy • Query language less important; better indexing techniques lead to efficiency • XISS (XML Indexing and Storage System) • September 19, 2001 publishing • Builds sets of indexes on XML data elements and attributes on initial parse of XML document • Lookup becomes constant-time through the various built indexes • Demonstrated successes in test runs
Query Processing - Implementation • XML is currently in state of flux • Standards are still being revised • Industry cautious before embracing a new technology • Economic slowdown may prevent new research and development efforts • XML still waiting for its “Killer App”, application that forces immediate acceptance
XML Query Processing • XML is a functional database storage language • Efficient query language needed to turn XML into a viable database • Query language solutions are being developed • Java/C++ hooks first developed – OK • XSLT/XPath implemented – GOOD • XQuery being designed – GREAT? • Future additions – ????