280 likes | 415 Views
Transforming XPath Queries for Bottom-Up Query Processing. Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba {ishikawa,kitagawa}@is.tsukuba.ac.jp. Presentation Overview. Background Motivation and Our Approach The Proximal Nodes Model Query Translation
E N D
Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba {ishikawa,kitagawa}@is.tsukuba.ac.jp ISDB’02
Presentation Overview • Background • Motivation and Our Approach • The Proximal Nodes Model • Query Translation • Translation Example • Related Work • Conclusions and Future Work ISDB'02
Background • XML : content-description language on the Web • XPath • pattern-based query language for XML • extracts XML nodes based on the specified pattern • has navigational semantics • XSLT uses XPath for the node specification • XQuery also uses XPath ISDB'02
XML Example <itemlist> <item category="audio equipment"> <catalog-info> <type>CD player</type> <manufacturer>Star Electronics</manufacturer> <catalog-no>CDP-R55N</catalog-no> </catalog-info> <sales-info> <prod-year>2001</prod-year> <price>125.00</price> </sales-info> </item> ... </itemlist> ISDB'02
XPath Query • Sample query Q: retrieve prices of CD players • XPath sentence • contains location steps separated by "/" • a location step has the format axis::node_test[predicate]...[predicate] • location steps can be abbreviated • e.g., /descendant::foo → //foo, /attribute::bar → @bar /itemlist/item[@category = "audio equipment"] [catalog-info/type = "CD player"]/sales-info/price ISDB'02
Presentation Overview • Background • Motivation and Our Approach • The Proximal Nodes Model • Query Translation • Translation Example • Related Work • Conclusions and Future Work ISDB'02
XPath Semantics article article authors authors authors authors authors author author author author author author author author author "Smith" "White" "Miller" "Miller" "Chen" "Miller" • XPath assumes top-down query processing • Not efficient for large XML databases • Bottom-up processing is better in some cases query: /article/authors[author = "Miller"] article article top-down bottom-up authors authors author author author author "Smith" "White" "Miller" "Chen" ISDB'02
Bottom-Up Query Processing article article authors authors authors author author author author author "Smith" "White" "Miller" "Chen" "Miller" • We can process the example query when • we can determine the specified leaf elements (i.e., "Miller") with the help of an index, and • we can select the parent for a specific author node. • We do not need to access all the authors/author elements ISDB'02
Our Objective and Approach • Our Objective • Efficient bottom-up processing of XPath queries with the help of index structures • Our Approach • Use of the proximal nodes model as the underlying retrieval model • The model enables bottom-up query evaluation • Development of transformation rules from XPath queries to proximal nodes expressions ISDB'02
Presentation Overview • Background • Motivation and Our Approach • The Proximal Nodes Model • Query Translation • Translation Example • Related Work • Conclusions and Future Work ISDB'02
The Proximal Nodes Model (1) • Proposed by Navarro and Baeza-Yates [7] as a structured document retrieval model • Uses bottom-up query processing approach • XML data can be treated as nested nodes: • a node corresponds to an element or attribute in XML • each node has an associated text region (called the segment): segments can take nested structure • Expressive power and efficiency are well-balanced • evaluation cost is almost O(n): n is the no. of nodes ISDB'02
The Proximal Nodes Model (2) • The model consists of three components • Text pattern matching language • specifies pattern matching conditions • implementation dependent • returns a set of the matched nodes • example: "ABC Corporation" • Retrieval operators based on document structures • returns a set of nodes for a given element or attribute name • example: chapter, price • Operators to integrate partial retrieval results • calculates the result node set from the given node sets • efficient computation based on segment relationships ISDB'02
Proximal Nodes Operators P and Q are nodes with associated segments ISDB'02
Example of Proximal Nodes Expression • Example expression of proximal nodes model • Query processing steps • 1. determine the node sets that corresponds to the elements "item" and "type" using indexes • 2. determine the node set that corresponds to the pattern "CD player" using an index • 3. compute the result of "same" operator • 4. compute the result of "with" operator item with (type same "CD player") ISDB'02
Presentation Overview • Background • Motivation and Our Approach • The Proximal Nodes Model • Query Translation • Translation Example • Related Work • Conclusions and Future Work ISDB'02
Translation Rules (1) • Supports major XPath patterns • Based on the XPath semantic description by Wadler [10] • Use of denotational semantics ISDB'02
Translation Rules (2) ISDB'02
Translation Rules (3) ISDB'02
Auxiliary Functions ISDB'02
Simplification Using the Knowledge of Document Structure • If we know the DTD of the target XML, we can derive more simplified translation results ISDB'02
Presentation Overview • Background • Motivation and Our Approach • The Proximal Nodes Model • Query Translation • Translation Example • Related Work • Conclusions and Future Work ISDB'02
Translation Example • Original query Q • Translation result: • t1 = item with (item with (category same "audio equipment")) • t2 = catalog-info child t1 • t3 = t1 with (t1 with (((type child t2) child t2) same "CD player")) • t4 = sales-info child t3 • ans = (((price child t4) child t4) child t3) child itemlist /itemlist/item[@category = "audio equipment"] [catalog-info/type = "CD player"]/sales-info/price ISDB'02
Simplification of Query Plan (1) • The translated result contains multiple application of an operator • We can delete redundant operators considering the operator semantics • Example: • t1 = item with (item with (category same "audio equipment")) → item with (category same "audio equipment") ISDB'02
Simplification of Query Plan (2) • If we can use the DTD information, we can further simplify the expressions • Example: • t3 = t1 with ((type child (catalog-info child t1)) same "CD player") → t1 with ((type in t1) same "CD player") • Simplified query plan for query Q • t1 = item with (category name "audio equipment") • ans = price in (t1 with ((type in t1) same "CD player")) ISDB'02
Presentation Overview • Background • Motivation and Our Approach • The Proximal Nodes Model • Query Translation • Translation Example • Related Work • Conclusions and Future Work ISDB'02
Related Work • Translation of XQL queries into proximal nodes expressions (Baeza-Yates&Navarro [2]) • Rewriting techniques for XQL queries (Wood [13]) • Use of document structure for the query optimization [3,11,12,13] • Optimization of regular path expressions in the context of semistructured DBs [4,8] ISDB'02
Presentation Overview • Background • Motivation and Our Approach • The Proximal Nodes Model • Query Translation • Translation Example • Related Work • Conclusions and Future Work ISDB'02
Conclusions and Future Work • Conclusions • Bottom-up processing approach for XPath queries • Support of major XPath query patterns • Translation to proximal nodes expressions • Simplification and optimization techniques • Future work • Support of more complete XPath semantics • Application of hybrid approach (top-down and bottom-up) ISDB'02