460 likes | 587 Views
Processing XML View Queries Including User-defined Foreign Functions on Relational Databases. Yoshiharu Ishikawa Jun Kawada Hiroyuki Kitagawa University of Tsukuba {ishikawa,kitagawa}@is.tsukuba.ac.jp. Presentation Overview. Background XML Views Support for foreign functions
E N D
Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki Kitagawa University of Tsukuba {ishikawa,kitagawa}@is.tsukuba.ac.jp WISE2002
Presentation Overview • Background • XML Views • Support for foreign functions • Overview of Our Approach • The XPERANTO Approach • Extension to the XPERANTO Approach • Query Processing Architecture • Experimental Evaluation • Conclusions and Future Work
XML Views • XML: content-description language on the Web • XML Views over RDBs • Constructing virtual XML views over RDBs • Data items are stored in RDBs • Selecting and transforming data items into appropriate XML formats • XML views are constructed using middleware technologies • Effective use of the data management and query processing facilities of the underlying RDBMSs • XPERANTO (IBM) [3,7,8], SilkRoute (AT&T) [4,5]
Virtual XML Views and Middleware • Middleware provides virtual XML view facility • A user query is specified by an XML query language (e.g., XQuery) toward an XML view • Middleware creates a query plan and issues an SQL query to RDBMS • Middleware transforms an SQL query result into the final XML format • adds XML tags • may perform remaining query tasks User Query Query Result Virtual XML View Middleware SQL Query SQL Query Result RDBMS
Example of XML View <cities> <city id="C0100"> <cname>A</cname> <population>16</population> <location> <coord><x>100</x><y>400</y></coord> <coord><x>100</x><y>200</y></coord> ... </location> <facilities> <facility id="I0015"><fname>E Mall</fname></facility> <facility id="I0016"><fname>F Park</fname></facility> ... </facilities> </city> ... </cities> Fig. 1
Presentation Overview • Background • XML Views • Support for foreign functions • Overview of Our Approach • The XPERANTO Approach • Extension to the XPERANTO Approach • Query Processing Architecture • Experimental Evaluation • Conclusions and Future Work
User Query with Foreign Function • User query may contain foreign functions • Example in XQuery <result> { for $city in view("cities")/cities/city where isWider($city/location, 10000, "km") and $city/population >= 10 return <city-info> $city/cname $city/facility </city-info> } </result> Fig. 3
Usually coded with a general-purpose programming language (e.g., Java) Receive in-memory representation of target XML document fragments (e.g., DOM) Middleware should evaluate foreign functions: since conventional RDBMSs do not support such facilities User-defined Foreign Functions Middleware Foreign Function <location> <coord><x>100</x> <y>400</y></coord> <coord><x>100</x> <y>200</y></coord> ... </location> XML Fragment
Presentation Overview • Background • Overview of Our Approach • The XPERANTO Approach • Extension to the XPERANTO Approach • Query Processing Architecture • Experimental Evaluation • Conclusions and Future Work
Our Approach • Processing XML view queries including foreign functions • By the cooperation of a conventional RDBMS and a middleware system • Extension of the XPERANTO framework • Proposal of two query processing methods • Two-step processing method • One-step processing method
Two-Step Processing Method User Query SQL Query SQL Query Result Query Result Middleware Query Execution Query Planning Foreign Function Evaluation Result XML Generation 1st query - to evaluate foreign functions 2nd query - to generate the result SQL Query SQL Query Result RDBMS
One-Step Processing Method User Query Query Result Middleware Query Execution Query Planning Foreign Function Evaluation Result XML Generation combined query - to evaluate foreign functions - to generate the result SQL Query SQL Query Result RDBMS
Presentation Overview • Background • Overview of Our Approach • The XPERANTO Approach • Extension to the XPERANTO Approach • Query Processing Architecture • Experimental Evaluation • Conclusions and Future Work
A default XML view is automatically created from the underlying relational tables An XML view is defined over the default XML view using XQuery XML Views in XPERANTO virtual XML View virtual XML view View Definition XQuery View Definition XQuery Default XML View <db>...</db> automatic derivation Relational Tables
A default XML view is automatically created from the underlying relational tables Each <row> element corresponds to a relational tuple Default XML View <db> <city> <row> <cid>C0100</cid> <cname>A</cname> <population>16</population> </row> ... </city> <location> <row> <cid>C0100</cid> <x>100</x> <y>400</y> </row> ... </location> ... corresponds to a city tuple Fig. 6
An XML view is defined over the default XML view using XQuery This view definition creates the XML view shown in Fig. 1 XML View Definition create view cities as { <cities> for $city in view("default")/city/row return <city id=$city/cid> <cname>$city/cname</cname> <population>$city/population</population> <locations> for $location in view("default")location/row where $city/cid = $location/$cid return <coord> <x>$location/x</x><y>$location/y</y> </coord> </locations> ... </city> </cities> } Fig. 7
Query Processing in XPERANTO User Query XQGM Graph • Transformation • Computation Pushdown • Tagger Pull-Up View Composition Composed XQGM Graph XQGM Graph Generation View Definition XQGM Graph SQL Query XML Tag Operators User Query virtual XML View View Definition XQuery Default XML View <db>...</db>
Each node corresponds to an extended relational operator (shown in Table 1) Contains a correlated join operator Node 11 is an abbreviated representation of node 12 Node 12 contains tag operators shown in Table 2 View Definition XQGM Graph Fig. 8
User Query XQGM Graph Fig. 9
Presentation Overview • Background • Overview of Our Approach • The XPERANTO Approach • Extension to the XPERANTO Approach • Query Processing Architecture • Experimental Evaluation • Conclusions and Future Work
Some Problems and Our Extension • The original XPERANTO approach [7] has some problems • It shows only simple transformation examples • It only considers • Computation pushdown processing for top-level relations • Simple XQuery query: selection query with one where clause • Simple where clause with one condition • Simple return clause • It does not consider foreign functions • Our extension to the XPERANTO approach • Devised user query graph generation (Fig. 9) • Incorporation of the intersection operator for multiple conditions in a where clause • Query translation that considers our extension
Query Translation • Our extension to the XPERANTO approach • Inclusion of foreign functions in where clause • Handling output specifications in return clause • Treatment of multiple conditions in where clause • Treatment of computation pushdown to subrelations • Query translation consists of the following steps • Decorrelation • View composition • Computation pushdown • Tagger pull-up
Decorrelation of View Definition XQGM Graph • A correlated join operator has high execution cost • Decorrelation step eliminates correlated join operators • View definition XQGM graph (Fig. 8) is translated as Fig. 10 Fig. 10
Decorrelation of User Query XQGM Graph • User query XQGM graph (Fig. 9) is also translated as Fig. 11 Fig. 11
View Composition • Compose a view definition XQGM graph and a user query XQGM graph, then apply function composition rules in Table 3 • This step is almost same as the original XPERANTO approach • Composition of Fig. 10 and 11 yields Fig. 12 Fig. 12
Computation Pushdown • Pushdown XQGM operators towards the leaves of the graph as much as possible • For the efficient evaluation using the query processing power of RDBMS • However, we cannot pushdown foreign function evaluation • Foreign function evaluation is performed in the middleware • Evaluation in the middleware requires XML fragments • Therefore, pushdown computation except for foreign function evaluations
Tagger Pull-Up • Replace XML functions with tag operators • Pull-up tag operators upward as much as possible • Two SQL queries are generated • SQL-1 from where clause • SQL-2 from return clause Fig. 13
Presentation Overview • Background • Overview of Our Approach • The XPERANTO Approach • Extension to the XPERANTO Approach • Query Processing Architecture • Experimental Evaluation • Conclusions and Future Work
Two-Step Processing Method (1) User Query Result XML Middleware Query Execution Tagger Fragment Tagger-1, 2 Query Planning Foreign Function Evaluator Tuple-1, 2 Keys SQL-1, 2 SQL Query Control SQL-1, Tagger-1 is used to retrieve tuples to evaluate foreign functions (Tuple-1) The qualified key value set Keys is combined with SQL-2 to select the result tuples RDBMS
Two-Step Processing Method (2) • Two approaches for the generation of SQL-2 • Two-Step Processing Method (where) • The qualified key value set Keys (obtained by foreign function evaluation) is embedded into the whereclause of SQL-2 (e.g., "where fid in Keys") • Two-Step Processing Method (tmp) • First, one-column temporary table is created from the key values in Keys • Then a join operation with the temporary table is incorporated in SQL-2
One-Step Processing Method UserQuery Result XML Middleware Query Execution Tagger Fragment Tagger-1, 2 Query Planning Foreign Function Evaluator Keys Tuple SQL-1, 2 SQL Query Control SQL-1 and SQL-2 are integrated in one SQL query (SQL-3) The middleware selects tuples of the final result using the qualified key set Keys RDBMS
Presentation Overview • Background • Overview of Our Approach • The XPERANTO Approach • Extension to the XPERANTO Approach • Query Processing Architecture • Experimental Evaluation • Conclusions and Future Work
Outline of Experiments (1) • PostgreSQL on Linux PC • Relational Tables • No. of city tuples: N = 1000 • No. of location tuples: 10N and 100N • No. of facility tuples: 10N and 100N • Four Types of Queries • Q1: For each city whose area is larger than X, show its name and facilities • Q2: For each city whose area is larger than X, show its name, location information, and facilities • Q3: For each city whose area is larger than X and whose population is larger than Y, show its name and facilities (Q1 + additional selection condition) • Q4: Q3 + additional selection condition
Outline of Experiments (2) • Selectivity factors • Area condition: Sa = 0.1, 0.3, 0.5, 0.7, and 0.9 • Population condition: Sp = 0.1 and 0.3 • Processing costs • Foreign function evaluation and XML generation are relatively small and almost equally included in both methods • Cost of two-step processing method: processing cost of SQL-1 and SQL-2 • Cost of one-step processing method: processing cost of SQL-3 (SQL-1 + SQL-2)
Q3 with Sp = 0.3 (no. of facility tuples = 10N) • Q3: For each city whose area is larger than X and whose population is larger than Y, show its name and facilities • No. of location tuples = 10N or 100N • Three methods have similar costs Fig. 20
Q3 with Sp = 0.3 (no. of facility tuples = 100N) • Two-step methods are better if selectivity of foreign function is low • filtering is well-performed • Two-step method (where) is worse than (tmp) • embedding of key values is not efficient Fig. 21
Q3 with Sp = 0.1 (no. of facility tuples = 10N) • Selectivity of the population attribute is small ⇒ pushdown to RDBMS is can reduce the no. of tuples • One-step method is better • two-step methods have overheads Fig. 22
Q4 with Sp = 0.3 (no. of facility tuples = 10N) • Q4: For each city whose area is larger than X and whose population is larger than Y, show its name, location information, and facilities • If no. of location tuples is large (100N) and if the selectivity of foreign function is small, two-step method (tmp) is better Fig. 23
Summary of Experiments • Two-step processing method (where) is worse than two-step processing method (tmp) in most situations • key value embedding is not a good idea • The cost of one-step processing does not depend on the selectivity of foreign function • If a query only contains a foreign function condition (Q1 and Q2), two-step processing method (tmp) is generally efficient when the selectivity of foreign function is small • If a query contains additional conditions (Q3 and Q4), the efficiency is depend on the selectivity factors • If the processing cost of RDBMS is small, one-step processing method is efficient
Guideline of Usage • Do not use two-step method (where) • If the processing cost in the RDBMS is quite small, use one-step method • If the query only contains foreign functions, use two-step method (tmp) • If the query contains additional filtering conditions • we have to select an appropriate one from one-step method and two-step method (tmp) • the selection is depend on the selectivity factors
Presentation Overview • Background • Overview of Our Approach • The XPERANTO Approach • Extension to the XPERANTO Approach • Query Processing Architecture • Experimental Evaluation • Conclusions and Future Work
Conclusions • Processing methods for XML view over relational databases especially when queries include foreign function calls • Cooperation approach of middleware and RDBMS • Extension of XPERANTO framework • Proposal of two query processing methods • Two-step processing method (where/tmp) • One-step processing method • Performance evaluation based on experiments
Future Work • Broadening the supportable XML queries • Query optimization • reduction redundancy • Development of other query processing approaches • bitmap indexes • with clause in SQL:1999 • Selection of an appropriate query processing method • development of selection heuristics
Q1 (no. of facility tuples = 100N) • Q1: No filtering condition on population attribute Fig. 26