200 likes | 208 Views
XPath Processor. MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute. XPath Expression. Node Set. Execution Tree. Mass Interface. Node Set. Introduction . XML Doc. XQuery Engine
E N D
XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute
XPath Expression Node Set Execution Tree Mass Interface Node Set Introduction XML Doc XQuery Engine (future development) Input XQuery XQuery XPath Processor Node Set XPath Expression VAMANA (XPath Query Engine) XPath MASS (A Multi-Axis Storage Structure for Large XML Documents) XML Doc
Processor Requirements • Must conform with XPATH grammar as specified by W3C • Implementation in C++ • Performance needed since XQuery can have several XPath expressions • MASS and VAMANA in C++ • Parser interface that is independent of query engine • Existing parsers coupled to query engine (Xalan, Pathan, LibXPath, Sablotron) • Facilitates efficient development and testing • Extensibility • Allow for user transformations of parse tree • Allow for XPath grammar changes • Error handling • Show useful error message • Parse tree should be destroyed automatically if parse error occurs
Processor Overview • Generates independent parse tree • Parse tree is a structure of in-memory C++ objects • Small footprint due to compact parse tree representation • Built-in transformations • Predicate transformation needed to facilitate data flow in query engines • User implemented callback interface • Completely separate from query execution • Generates execution tree from parse tree • User derives from parser class and provides implementation • Implementation is a simple 1-to-1 transformation XPATH Expression Parse Tree Parser (Productions) Transformer Execution Tree Transformed Parse Tree Generator (Callbacks)
Parse Tree Node Hierarchy Step Node Axis Type Node Name Test Node Function Node Parse Node Function Arg Node Filter Node Unary Predicate Node Negative Node Binary Predicate Node Predicate Node Set Operator Node Literal Node Number Operator Node
Running Example • XPath syntax • AxisName::NodeTest[predicate] • /child::company/descendant-or-self::employee[position() = 1] • Abbreviated notation: /company//employee[1] <company> <branch> <location>Boston</location> <employee> <name>JohnDoe</name> <id>471</id> <salary>54250</salary> </employee> <employee> <name>JaneDoe</name> </employee> </branch> </company> XML Document example
Productions to Parse Tree/child::company/descendant-or-self::employee[position() = 1] Expr Productions: Parse Tree: Step employee LocPath Step Binary Predicate Equals LocPath Step company Axis Nodetest Step Literal 1 Function Position Axis Nodetest Pred Function Literal
Abbreviated Notation • Abbreviated expression • /company//employee[1] • Normalized expression • /child::company/descendant-or-self::employee[position() = 1] Parse Tree: Step employee Normalized Parse Tree: Binary Predicate Equals Step company Function Position Literal 1
Transformed Parse Tree • Data flow query engines require data to flow from the step node up through its predicate node which filters the data • Predicate nodes must be placed above their corresponding step node • When could this be done? • While parsing • Too complicated because productions expect certain inputs • While generating execution tree • Would make user interface too complex • In post-processing of parse tree • Simple recursive transformation
Binary Predicate Equals Literal 1 Function Position Binary Predicate Equals Literal 1 Function Position Transformed Parse Tree/child::company/descendant-or-self::employee[position() = 1] Transformed Parse Tree: Parse Tree: Step employee Step company
User Implementation of Callbacks • Interface defines a callback for each node type • Callbacks supply parse tree parameters • Value – axis, nodetest, literal, etc • Role – relationship to parent • User implements callbacks to generate execution tree • An in-order traversal of the tree is made and appropriate callbacks are called
Callbacks/child::company/descendant-or-self::employee[position() = 1] Parse Tree: Corresponding Callbacks Invoked: Binary Predicate equals Start Predicate (equals, root) • Start Step (employee, context) • Start Step (company, context) • End Step • End Step Step employee Literal 1 Function position • Start Function (position, operand) • End Function • Start Literal (1, operand) • End Literal End Predicate Step company
Execution Tree Generation • User derives from parser class and provides implementation of the callback interfaces • Implementation specific to execution tree stored in derived class • Stack required to maintain context in execution tree • Each Start callback creates a node, attaches it to its parent (top of stack) and pushes it onto the stack • Each End callback pops a node off the stack • Implemented parser interface for VAMANA query engine • 1-to-1 mapping makes execution tree generation simple
Step (employee, context) Step (company, context) Predicate (equals, root) Literal (1, expr) Function (position, expr) Execution Tree Generation/child::company/descendant-or-self::employee[position() = 1] Callbacks: Execution Tree: Start Predicate (equals, root) Start Predicate (equals, root) • Start Step (employee, context) • Start Step (employee, context) • Start Step (company, context) • Start Step (company, context) V Binary Predicate equals • End Step • End Step • End Step • End Step • Start Function (position, operand) • Start Function (position, operand) • End Function • End Function • Start Literal (1, operand) • Start Literal (1, operand) • End Literal • End Literal Mass Node employee V Literal 1 V Function position End Predicate End Predicate Stack: Mass Node company
Evaluation • Numerous XPATH expressions were tested including all of the examples in W3C’s XPath specifications • Each tree was printed, enabling visual evaluation (example to follow) • Error messages helpful in locating parse error • Example (using \ instead of /) • child::company\descendant-or-self::employee[position() =1] • Output • child::company\descendant-or-self::employee[position() =1] -------------------^ Parse Error!
Printed Trees/child::company/descendant-or-self::employee[position() = 1] Parse Tree: Transformed Parse Tree: Execution Tree: --------------- |child | |company | |CONTEXT| /~ ------------- ------------------------- |descendant-or-self| |employee | | ROOT | ------------------------- | --------------- | |position | | |FUNCTION| | |OPERAND| | /~ -------------- \_ ---------- | = | |BIPRED| | PRED | ----------- \_ --------------- | 1 | |LITERAL | |OPERAND| --------------- --------------- |position | |FUNCTION| |OPERAND | /~ -------------- ----------- | = | |BIPRED| | ROOT | | |\_ -------------- | | | 1 | | | |LITERAL | | | |OPERAND| | | --------------- ----------- | -------------- | | child | | |company | | |CONTEXT| | /~ ------------- \_ ------------------------ |descendant-or-self| | employee | | CONTEXT | ------------------------- ----------------- |position | |VFUNCTION| | OPERAND | /~ ---------------- ------------- | = | |VBIPRED | | ROOT | | |\_ -------------- | | | 1 | | | |VLITERAL | | | |OPERAND| | | --------------- -------------- | -------------- | | child | | |company | | |CONTEXT| | /~ ------------- \_ ------------------------ |descendant-or-self| | employee | | CONTEXT | -------------------------
Conclusion • Successful implementation XPath Processor completely independent of query engine • Successful integration with VAMANA and MASS • Successful MQP overall
Thanks I would like to thank Elke Rundensteiner, Kurt Deschler and Venkatesh Raghavan. This XPATH Processor was a contribution to both the MASS and VAMANA projects and is the result of a collaborated effort.
References 1. Tim Bray, Jean Paoli, C. M. Sperberg-Mcqueen and Eve Maler. Extensible Markup Language (XML), Version 1.0, Second Edition, W3C Recommendation, October 6, 2000. http://www.w3.org/TR/REC-xml 2. Jim Clark and Steve DeRose. XML Path Language (XPATH), Version 1.0, W3C Recommendation, November 16, 1999. http://www.w3.org/TR/xpath.html 3. Don Chamberlin, Peter Fankhauser, Massimo Marchiori and Jonathan Robie. XML Query Requirements, W3C Working Draft, February 15, 2001. http://www.w3.org/TR/xmlquery-req 4. Kurt W. Deschler and Elke Rundensteiner. MASS: A Multi-Axis Storage Structure for Large XML Documents, 2002, Technical Report in progress. 5. Venkatesh Raghavan. VAMANA – Efficient Xpath Query Engine Exploiting the MASS Index, October 23, 2002, Master’s Thesis Proposal.