390 likes | 491 Views
VAMANA (Talk 2) ( vǎ - mǎ - nǎ ). An Efficient XPath Query Engine Exploiting the MASS Index. Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003. Introduction. Purpose of the talk. Generation of Execution Tree Execution Running Example 1. Running Example 2.
E N D
VAMANA (Talk 2)(vǎ - mǎ - nǎ) An Efficient XPath Query Engine Exploiting the MASS Index Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1ST May 2003
Introduction • Purpose of the talk. • Generation of Execution Tree • Execution • Running Example 1. • Running Example 2. • XPath Expression Execution. • Cost Estimation. • Heuristics and Transformation.
Running Examples E.g. 1: //name/parent::person/descendant::watch <people> <person id="person1"> <name> Hayato Cappelletti </name> <watches> <watch open_auction="open_auction82" /> E.g. 2: //name[ text() = “Klemens Pelz” ]/parent::person <people> <person id="person1"> <name> Klemens Pelz </name>
XPath Expression Node Set Execution Tree Mass Interface Node Set Bigger Picture XQuery Engine (future development) XPath Processor VAMANA (XPath Query Engine) MASS (A Multi-Axis Storage Structure for Large XML Documents)
How many “ROOT(s)” are there? • Root of the Document • We call it “Document Root” • Root of the expression • //name/parent::person/descendant::watch • We call it “First Location Step” • Root of Execution Tree • We call it “ROOT”
XPath Expression XPath Processor Execution Tree XPath Processor E.g. 2: //name[ text() = “Klemens Pelz” ]/parent::person person Parent ROOT Phase 1: Parse Tree name // CONTEXT “Klemens Plez” LITERAL OPERAND BIPRED = PRED text child OPERAND
XPath Expression XPath Processor Execution Tree Phase II: Transformed Parse Tree “Klemens Plez” LITERAL OPERAND BIPRED = PRED text child OPERAND “Klemens Plez” LITERAL OPERAND BIPRED = PRED text child OPERAND Contd.. person Parent ROOT name // CONTEXT Phase I: Parse Tree
XPath Expression XPath Processor Execution Tree “person” X: Parent “name” X: // “” X: child “Klemens Plez” BI_PREDICATE “EQ” Phase III: Execution Tree Generation person Parent ROOT “Klemens Plez” LITERAL OPERAND BIPRED = PRED text child OPERAND name // CONTEXT Phase II: Transformed Parse Tree Phase III: VAMANA Execution Tree
Execution Tree VAMANA (XPath Query Engine) Mass Interface Node Set MASS VAMANA Nodes (VNode) VRootNode MassNode Node Base VBinaryPredicateNode VExistPredicateNode VJoinNode VLiteralNode
Execution Tree VAMANA (XPath Query Engine) Mass Interface Node Set MASS VNode Structure Root Node Expression Side child Context Side
Execution Tree VAMANA (XPath Query Engine) Mass Interface Node Set MASS VNode Flow Structure • Data-Flow style of querying. • Most of commercial relational database system. • Each node is arranged in a fashion such that data“flow” from one node to another in a procedure-consumer fashion. • Correctness. • Each node performs some operation on the data that flows through it. • The result is produced by the last node on the dataflow chain. • IN SHORT: • Data Flows upwards. • Control Flows downwards. • Iterative.
Execution Tree VAMANA (XPath Query Engine) Mass Interface Node Set MASS Contd. • Iterative. • Currently VAMANA executes nodes iteratively. • So no copies of the data is made. • IS IT A PROBLEM? • MASS produces nodes in document order so not a problem. • But there are some expression that in sibling order. • Work in progress.
Execution Tree VAMANA (XPath Query Engine) Mass Interface Node Set MASS “name” X: // “watch” X: AXIS_DESCENDANT “person” X: AXIS_PARENT Execution Tree E.g. 1: //name/parent::person/descendant::watch Root Node Context Side
Execution Tree VAMANA (XPath Query Engine) Mass Interface Node Set MASS How Do We EXECUTE ? • Step 1: • Set Context Node of the root of the expression. • In this example the root of the expression is the root of the document. • Step 2: • Ask the VAMANA Root Node for nodes. //name/parent::person/descendant::watch
“watch” X: AXIS_DESCENDANT “person” X: AXIS_PARENT “name” X: // Step1:Setting Context for the “First Location Step” //name/parent::person/descendant::watch
b.i.c.m.c “watch” X: AXIS_DESCENDANT “person” X: AXIS_PARENT “name” X: // b.i.c b.i.c.c INTIAL FETCHING b.i.c.c OUT OF NODE //name/parent::person/descendant::watch b.i.c.m.c b.i.c
b.i.c.m.e “watch” X: AXIS_DESCENDANT “person” X: AXIS_PARENT “name” X: // b.i.c b.i.c.c //name/parent::person/descendant::watch b.i.c.m.c b.i.c.m.c b.i.c.m.e b.i.c b.i.c.c
“watch” X: AXIS_DESCENDANT “person” X: AXIS_PARENT “name” X: // b.i.c b.i.i b.i.c.c b.i.i.c //name/parent::person/descendant::watch b.i.c.m.e b.i.c.m.e b.i.i.m.c b.i.i b.i.i.c
IO Operation ** Please see handout a.a.a , a.b.a, a.b.b , a.c.a , a.c.a, a.c.b /z a.a , a.b , a.c //y
“name” X: // “person” X: AXIS_PARENT “ ” X: AXIS_CHILD “Klemens Pelz” BI_PREDICATE EQ Example 2 //name [ text() = “Klemens Pelz” ]/parent::person Context Side Expression Side
b.i.e.c “person” X: AXIS_PARENT BI_PREDICATE EQ “name” X: // “ ” X: AXIS_CHILD “Klemens Pelz” b.i.e.c b.i.e.c b.i.e.c b.i.e.c //name [ text() = “Klemens Pelz” ]/parent::person b.i.e b.i.e.c.b Klemens Pelz
Determining Selectivity NodeType: NodeTest: X: Count: IN: OUT: I_Tuples: • Count. • The exact count of the number of nodes in MASS storage structure of that particular nodetest. • IN. • The number of tuples that are fetched by the child VNode. • OUT. • The number of tuples produced by the VNode. • I_Tuples. • Total number of tuples processed till that VNode. • This includes the cutrrent node also.
NodeType: MASS NodeTest: person X: AXIS_PARENT Count: 255 IN: 482 OUT: ? Example 1: //name/parent::person/emailaddress NodeType: MASS NodeTest: name X: // Count: 482 IN: 482 OUT: 482
Worst Case – Costing • Categorize the axis into three division • Division 1: • child | descendant | descendant-or-self NodeType: NodeTest: X: Count: IN: OUT: • Cases: • #X > #Y • #Y > #X X #X NodeType: NodeTest: X: Count: IN: OUT: Y
Contd. • Division 2: • parent, ancestor, ancestor-or-self, following, following-sibling, preceding, preceding-sibling NodeType: NodeTest: X: Count: IN: OUT: • Cases: • #X > #Y • #Y > #X X #Y NodeType: NodeTest: X: Count: IN: OUT: Y
Contd. • Division 3: • Self • For Example: • //*/self::X • Y/self::* • Cases: • #X > #Y #Y • #Y > #X #X NodeType: NodeTest: X: Count: IN: OUT: X NodeType: NodeTest: X: Count: IN: OUT: Y
NodeType: MASS NodeTest: watch X: AXIS_DESCENDANT Count: 488 IN: 482 OUT: 488 I_Tuple: 1225 NodeType: MASS NodeTest: person X: AXIS_PARENT Count: 255 IN: 482 OUT: 482 I_Tuple: 737 NodeType: MASS NodeTest: name X: // Count: 482 IN: 482 OUT: 482 I_Tuple: 482
What about Binary Operator • Cost expression sides w.r.t. to child. • Operator = AND | OR | EQ. • ALL go out. • Arithmetic Operators. • ALL go out. • Because cannot predict before execution.
Heuristics • Higher the ratio, better the selectivity. • Generate a multimap <scaled(IN/OUT),VNode>. • Each optimize-able node can then applied the rules that apply to it. Ratio = IN/OUT Scaled Ratio = scale0..1 (IN/OUT)
“name” X: // “person” X: AXIS_PARENT “name” X: // “name” X:AXIS_PARENT “Klemens Pelz” X: AXIS_VALUE BI_PREDICATE EQ “ ” X: AXIS_CHILD “Klemens Pelz” “Klemens Pelz” Transformation Rule 1: Binary Predicate with text comparison Value Index
“name” X: // “watch” X: AXIS_DESCENDANT “person” X: AXIS_PARENT “name” X: // “person” X: AXIS_PARENT “watch” X: AXIS_DESCENDANT JOIN X: AXIS_DESCENDANT Transformation Rule 2 //name/parent::person/descendant::watch • Mass Node to Join Root Node
* Removal Rule: p/descendant :: */child::n ≡ p/descendant::n Where, p : path expression • Need for this rule: • with nodes "*" as node test, during the cost estimation this might be the spoilsport.
“Axis::self” Removal Rule: p/descendant::*/self::m ≡ p/descendent::m Rule: p/descendant-or-self::*/self::m ≡ p/descendent-or-self::m • Need for the node: • “self” node in combination with * or a node test not necessary.
Reverse Axes rules • Rule : p/descendant::n/parent::m ≡ //descendant-or-self::m[child::n] • Rule: p/descendant::n/m ≡ p/descendant::m[parent::n] • Rule: /descendant::m/preceding::n ≡ /descendant::n [ following::m] From Paper: Symmetry in XPath by Dan Olteanu, Holger Meuss, Tim Furche, Francois Br
Predicate Axis Rules • Rule: p/descendant::* [child::n] ≡ p [descendant::n] / descendant:: * • Predicate Node to Join.
Conclusion • Work in progress in THREE main areas. • Frame work for XPath expression execution. • Selectivity Determination. • Transformation Rules.
References 1. James Clark and Steve DeRose. XML Path Language (XPATH), http://www.w3.org/TR/xpath, 2002. 2. S.Boag, D.Chamberlin, Mary F. Fernandez, D.Florescu, J.Robie and J.Siméon, XQuery 1.0: An XML Query Language. W3C Working Draft, http://www.w3.org/TR/xquery/, 2002. 3. Kurt W. Deschler and Elke Rundensteiner. MASS- Multi Axis Storage Structure, 2002, Technical Report in progress\. 4. T. Milo and D. Suciu. Index structure for path expression, In Proceedings of 7th International Conference on Database Theory, 1999, pages 277-295. 5. Flavio Rizzolo, Alberto Mendelzon. Indexing XML Data with ToXin},WebDB, pages 49-54, Santa Barbara, USA, 2001. 6. Q. Li and B. Moon. Indexing and Querying XML Data for Regular Path Expressions, Proceedings of 27th International Conference on Very Large Database (VLDB'2001), Rome, Italy, September 2001, pages 361-370. 7. XMark - The XML Benchmark project. http://monetdb.cwi.nl/xml/.