470 likes | 628 Views
CSE 636 Data Integration. XML Distributed Query Processing Slides by Yannis Papakonstantinou. Overview. The Virtual XML View Approach towards Data Integration Query Processing in XML Mediators Issues Overview An Algebra-Based Architecture Navigation-driven Evaluation.
E N D
CSE 636Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou
Overview • The Virtual XML View Approach towards Data Integration • Query Processing in XML Mediators • Issues Overview • An Algebra-Based Architecture • Navigation-driven Evaluation
Data Integration Requirements in eBusiness Applications • It starts with …“Provide to customers, partners, employees Application X”, where X may be in Business Intelligence, Customer Support, … • Then the problem comes up…“The applications uses information assets widely distributed across my enterprise” • If only….“Give to the application a single place to go to access all the information required. Requirements are evolving so make sure the system can be easily maintained and upgraded”
View-Based Approach: Wrappers Export Basic Source Views customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … <customer_table> <customer> <name>John</name> <id>56</id> <city>Chicago</city> </customer> <customer> <name>George</name> <id>58</id> <city>Chicago</city> </customer> … </customer_table> Client Application Integrated (XML) View Mediator (XML) View (XML) View Wrapper Wrapper Customers Rel. DB Orders Rel. DB
Wrappers Export Basic Source Views order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … Client Application Integrated (XML) View Mediator (XML) View (XML) View Wrapper Wrapper Customers Rel. DB Orders Rel. DB
Mediators Export Integrated Views, Tailored to Application Needs customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago … Client Application Integrated (XML) View Mediator (XML) View (XML) View Wrapper Wrapper Customers Rel. DB Orders Rel. DB
Virtual Views: Query-Driven Mediator Operation Find all Chicago customer names, along with their ordered items Application Retrieve Chicago customer names and id’s Retrieve all cid’s and item names of orders Mediator Wrapper Wrapper Customers Database Orders Database
customers customer name John ordered_items item chips item salsa customer … On-Demand (Query-Driven)Mediator Operation customer name John id 56 … order cid 56 item chips order cid 56 item salsa … Application Mediator Wrapper Wrapper Customers Database Orders Database
Multiple Plans are Possible • Retrieve customers • For each customer find matching orders
A New Kind of Query Processing Problem • Build and Run “Optimal” Plan • Consisting of operators that • Collect source info using supported queries and commands • Combine info into XML result
Challenges in Query Processing & Optimization • Operate within the Limited and Different Capabilities of the Sources • Describe sets of supported queries • Use most efficient supported queries • Optimize plans/queries sent to sources • Estimate Costs of Plans • Adapt Plans Along the Way • Beyond Conjunctive Queries • Compose Queries/Views Efficiently • Schema inference & optimization • Combine navigation & querying
From Limited Wrappers to Efficient Plans for Extended Query Sets Queries supported by mediator Queries supported by wrapper • Answering Queries Using Views • But with Infinite Sets of Views • Increasing Relevance due to Web Services all queries over schema Source Data & Schema Source Data & Schema
Challenges in Query Processing & Optimization • Operate within the Limited and Different Capabilities of the Sources • Describe sets of supported queries • Use most efficient supported queries • Optimize plans/queries sent to sources • Estimate Costs of Plans • Adapt Plans Along the Way • Beyond Conjunctive Queries • XQuery processing • Schema inference & optimization • Combine navigation & querying • Build iterator models for low memory footprint
Navigation-Driven Evaluation of Query Result customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … order_table order id 1034 cid 56 item chips order id 1567 cid 56 item salsa … customer_table customer name John id 56 city Chicago customer name George id 58 city Chicago …
Navigation-Driven Evaluation right(p) down(p) p Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source
Navigation-Driven Evaluation Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source
Navigation-Driven Evaluation Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source
Navigation-Driven Evaluation Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source
Navigation-Driven Evaluation Input: client navigations view definition ans = q( s1 … sn ) Client result Lazy Mediator Output: source navigations s1 sn ... XML source XML source
Mixing Querying & Navigation customers customer name John id 56 city Chicago orders order id 1034 item chips order … customer … Find details of all salsa orders below visited node
Challenges in Mixing Querying & Navigation • Two-dimensional navigation • Reminds of cursors but there are multiple continuation points • Controlling size + shape • Contextualizing queries by navigation
Overview • The Virtual XML View Approach towards Data Integration • Query Processing in XML Mediators • Issues Overview • An Algebra-Based Architecture • Navigation-driven Evaluation
An Algebra-Based Query Processor Architecture Client XQuery Navigation Requests Results XQuery Views Translation to Algebra Algebra Plan Source Schemas & Types Source Description Rewriter/Optimizer Physical Algebra Plan Functions Plan Execution Engine Function Description Queries & Fetch Requests to Sources
Query Processing on Tuple-Oriented Algebra Enables… • Well-known efficient physical implementations of the operators • Join optimization • Nested data by nested plans or group-by • Efficient iterator model
XQuery: Queries & Views for XML <customers> { for $cust in document(“db”)/customer return <customer> { $cust/id, for $order in document(“db”)/order where $order/cid = $cust/id return <order> { $order/id } </order> } </customer> } </customers>
Access and Navigation $db1 $cust $cust_id ct c1 i1 ct c2 i2 $db1 $cust ct c1 ct c2 ct c1 i1 $db1 ct c2 i2 getD $cust, id $cust_id db customer_table customer name John id 56 customer name George id 58 getD $db1, customer $cust source db, [$db1]
Simplification Using Schema Inference $db1 $cust_id ct i1 ct i2 ct $db1 ct Since $cust_id $cust and $cust is “useless” otherwise db customer_table customer name John id 56 customer name George id 58 getD $db1, customer/id $cust_id i1 i2 source db, [$db1]
Nested Plans Plan p … $db1 $cust_id $orders ct i1 [o11…] nestedSrc $part $db1 $cust_id ct i1 $db1 $cust_id ct i2 $db1 $cust_id ct i1 $db1 $cust_id ct i2 $db1 $cust_id $part ct i1 ct i2 $db1 $cust_id ct i1 ct i2 ct i2 [o21…] apply $part, p $orders for $part
Joins and Selections $db1 $cust_id ct i1 $cust_id $db1 $cust_id $db2 $order $cust_id2 $order_id … $cust_id2=? $db2 $order $cust_id2 $order_id … getD $order, id $order_id getD $order, cid $cust_id2 getD $db2, order $order nestedSrc $part source db, [$db2]
Constructors … $order_id $oidL … o1 [o1] … o2 [o2] … $oidL $oidE … [o1] e1 … [o2] e2 e2 order e1 order $orders [e1, e2] listify $oidE $orders o2 crEl order, $oidL $oidE o1 crList $order_id $oidL … $order_id … o1 … o2
Plan Decomposition • Within Rewriting Optimizer • Rules replacing “leaf” trees • May move commutable parts • Catch: No projection limitation
Replacing Nested Plans with GroupBy/Outerjoin Combinations apply $part, p $R apply $part, p $R p3 p3 nestedSrc $part groupBy S(p1) $part p2 nestedSrc $part for $part p1 p1 p2
Overview • The Virtual XML View Approach towards Data Integration • Query Processing in XML Mediators • Issues Overview • An Algebra-Based Architecture • Navigation-driven Evaluation
Building Navigation-Driven Evaluation on the Algebra Client Source access Source access Source Source
Think of Each Operator as a Lazy Mediator $db1 $cust $cust_id ct c1 i1 ct c2 i2 $db1 $cust ct c1 ct c2 root tuple $db1 customer_table customer name John id 56 customer name George id 58 c1 $cust $cust_id i1 tuple getD $cust, id $cust_id c2 $db1 $cust i2 $cust_id
Navigation-Driven Evaluation of Operators • Augmented with • nextTuple(p) • p.attr Input: client navigations result Lazy Operator Output: source navigations s1 sn ... Result of Operator below Result of Operator below
Use of Semantic Id’s in Navigation-Driven Evaluation <f’1, f’2, …, f’n> Operator State V1: V2: … Vn: Other: … Proceed down/right f’1 f’2 … f’n r/d(<f1, f2, …, fn>) Operator State V1: V2: … Vn: Other: … f1 f2 … fn
Fragments Reduce the “Set State” – “Produce State” Overhead root customer Hole 3 name, “John” order Hole 2 oid, 123 lineitem lineitem lineitem Hole 1
Fragments Reduce the “Set State” – “Produce State” Overhead root customer Hole 3 name, “John” order Hole 5 order ordnum=16 oid, 123 lineitem lineitem lineitem Hole 1 Hole 4 lineitem lineitem
Controlling the Size and Shape of Fragments Client listify Client-Server Interaction Controller listify Source access Source access Source Source
Fragment Size causes Memory Footprint causes Performance
Fragmentation Strategies • Fixed Fragment Size • Ideal for depth-first, left-to-right navigation • Adaptive Fragment Size • Assign larger pieces to those who use them
Response Performance for Breadth-First and Depth-First Depth First traversal Breadth First traversal
References • Navigation-Driven Evaluation of Virtual Mediated Views • Bertram Ludäscher, Yannis Papakonstantinou, Pavel Velikhov • EDBT 2000 • Architecture and Implementation of an XQuery-based Information Integration Platform • Yannis Papakonstantinou, Vasilis Vassalos • IEEE Data Eng. Bull. 25(1), 2002 • XML queries and algebra in the Enosys integration platform • Yannis Papakonstantinou, Vinayak R. Borkar, Maxim Orgiyan, Konstantinos Stathatos, Lucian Suta, Vasilis Vassalos, Pavel Velikhov • Data Knowl. Eng. 44(3), 2003