300 likes | 425 Views
Order-sensitive XML Query Processing over Relational Sources: An Algebraic Approach. Authors: Ling Wang, Song Wang, Brian Murphy and Elke A. Rundensteiner Institute: Database Systems Research Group, Worcester Polytechnic Institute (WPI). IDEAS’2005. Order in XML.
E N D
Order-sensitive XML Query Processing over Relational Sources: An Algebraic Approach Authors: Ling Wang, Song Wang, Brian Murphy and Elke A. Rundensteiner Institute: Database Systems Research Group, Worcester Polytechnic Institute (WPI) IDEAS’2005
Order in XML • Order is important to XML • Document order • XML view can be ordered (OrderBy) … • User query can be order-sensitive (OrderBy, position(), range()…) <RECORDLIST> <PLAY> <BAND> Misfits </BAND> <SONG> She </SONG> </PLAY> <PLAY> <BAND> Back Street Boy </BAND> <SONG> Bullet </SONG> <SONG> We Are 138 </SONG> </PLAY> <PLAY> <BAND> Project X </BAND> <SONG> SXE Revenge </SONG> <SONG> Shutdown </SONG> </PLAY> </RECORDLIST> 1. Sort PLAY by its band’s name 2. Find third PLAY 3. Extract its first and second SONG <RESULT> FOR $play in document(“record.xml")/PLAY OrderBy $play/band RETURN <SONG> $play[3]/SONG[rang 1 to 2]/text() </SONG> </RESULT> <RESULT> <SONG> SXE Revenge Shutdown </SONG> </RESULT>
Why XML-to-SQL? • XML is stored in relational database to … • provide reliable persistent storage • exploit mature technologies • XML-to-SQL Systems • SilkRoute (AT&T), XPERANTO (IBM), RAINBOW (WPI), Rolex (BellLab), Agora, MARS • Oracle XML DB, Microsoft SQL Server 2000 SQLXML, IBM DB2 XML Extender
XML Views • XML Views • support XML view mechanism for XML data publishing • support queries (updates) over XML views • XML publishing scenario • Relational model is not order-sensitive • Order in XML views over RDB has no meaning • XML storage scenario • Order is essential !!! • Order-preserving loading • XML document Relational database • Implicit order in XML document explicit order code in RDB • Order-restoring in extraction views • Explicit order code in RDB implicit order in XML viewthrough view query User query XML View XML Order encoding RDB
Order-specific loading • Order-specific loading: • Loading strategies: Inline, edge, … • Order encoding methods: Global, local, dewey …
Example XML schema XML document <xs:schema xmlns:xsd="http://www.w3.org/XMLSchema"> <xs:element name="RECORDLIST"> <xs:complexType> <xs:element name="PLAY" minoccurs="1" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="BAND" type="xs:string"/> <xs:element name="SONG" type="xs:string" minoccurs="1"/> </xs:sequence> </xs:complexType> </xs:element> </xs:complexType> </xs:element> </xs:schema> <RECORDLIST> <PLAY> <BAND> Misfits </BAND> <SONG> She </SONG> </PLAY> <PLAY> <BAND> Back Street Boy </BAND> <SONG> Bullet </SONG> <SONG> We Are 138 </SONG> </PLAY> <PLAY> <BAND> Project X </BAND> <SONG> SXE Revenge </SONG> <SONG> Shutdown </SONG> </PLAY> </RECORDLIST>
Example Relational Database Inline loading + local order encoding View query <RECORDLIST> FOR $play IN document("dxv.xml")/PLAY/ROW ORDER BY $play/POSITION/text() RETURN <PLAY> <BAND>$play/BAND_PCDATA</BAND> FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN <SONG> $song/SONG_PCDATA/text() </SONG> </PLAY> </RECORDLIST> RECORDLIST PLAY SONG
Motivation • Many loading + Encoding combinations are possible … • {inline, edge, …} * {local, global, dewey…} • Hybrid of multiple loading and encoding may occur: • Loading: • Schema is available --- inline • Schema is not available --- edge • Order-encoding • Heavy update workload --- dewey • Query workload --- global • Multiple XML documents are loaded into RDB • Other loading and encoding methods may emerge in future • Conclude: Need general approach for XQuery-to-SQL translation
XSOT • XML-to-SQL Order-sensitive Translation (XSOT): • Step1: Encode XML document with explicit order code (order-exposing) • Step 2: Load XML to relational database (order-preserving) • Step 2: Extract XML view from relational database (order-restoring) • Step 3: Query via XML view with order predicates (order-sensitive)
XSOT Framework DB2 SQL Server Oracle Sybase Web/Intranet Application User Order-Sensitive User Query View Query XML Result XML Data Order Code Comparison Function XAT Generator XML Schema View XAT User XAT XQuery Parser Legend View Composer Default XML View Order Encoding XQuery Default XML Schema Default XML View Sub- System Composed XAT XML Generator XAT Optimizer Loading XQuery XML Source Wrapper Process Optimized XAT XAT Ordered Tuple Streams Schema generation SQL Generator Data Loading XQuery Engine Data Extracting Data SQL Mapping Manager Query flow RDBMS Data flow
Running Example View query Relational Database Inline loading + local order encoding <RECORDLIST> FOR $play IN document("dxv.xml")/PLAY/ROW ORDER BY $play/POSITION/text() RETURN <PLAY> <BAND>$play/BAND_PCDATA</BAND> FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN <SONG> $song/SONG_PCDATA/text() </SONG> </PLAY> </RECORDLIST> RECORDLIST PLAY SONG Find second song of each play <RESULT> FOR $record in document(“record.xml") RETURN <SONG> $record/PLAY/SONG[2]/text() </SONG> </RESULT> <RESULT> <SONG> We are 138 Shutdown </SONG> </RESULT>
Order-sensitive XML Algebra Tree • XSOT methodology: • An algebraic approach • XML Algebra Tree (XAT) • XAT operators • Select, CartesianProduct, ThetaJoin, LeftOuterJoin, Distinct, GroupBy, OrderBy • Source, Navigate, Combine, Tagger • XAT Order Extension • Position() • Range() • Composition of the view and user XAT
Tagger <RECORDLIST> $dataPlayTag</ RECORDLIST >$record 19 Combine$dataPlayTag 18 Tagger <PLAY> $dataSongTag</PLAY>$dataPlayTag 17 GroupBy$play 16 Tagger<SONG>$sData</SONG>$dataSongTag Combine $dataSongTag 14 15 Navigate$song, SONG_PCDATA/text()$sData 13 GroupBy$play 12 OrderBy $sPos Navigate$song, POSITION/text()$sPos 11 10 ThetaJoin $pIID=$sPID 9 Navigate$song, PID/text()$sPID Navigate$play, IID/text()$pIID 5 8 Navigate$S, SONG/ROW$song OrderBy $pPos 4 7 Source “dxv.xml” $S Navigate$play, POSITION/text()$pPos 6 3 Navigate $P, PLAY/ROW$play 2 Source “dxv.xml” $P 1 View Query XAT <RECORDLIST> FOR $play IN document("dxv.xml")/PLAY/ROW ORDER BY $play/POSITION/text() RETURN <PLAY> <BAND>$play/BAND_PCDATA</BAND> FOR $song IN document("dxv.xml")/SONG/ROW [PID/text() = $play/IID/text()] ORDER BY $song/POSITION/text() RETURN <SONG> $song/SONG_PCDATA/text() </SONG> </PLAY> </RECORDLIST>
Tagger <RESULT> $uDataSongTag</RESULT>$result 28 Combine$uDataSongTag 27 Tagger <SONG> $uSongData</SONG>$uDataSongTag 26 Navigate$uSong, text()$uSongData 25 Select$uNumPos=2 24 GroupBy$record, $uPlay 22 Navigate$uPlay, SONG$uSong POS$uSong$uNumPos 23 21 Navigate$uRecord, PLAY$uPlay 20 Source “record.xml” $P User Query XAT <RESULT> FOR $record in document(“record.xml") RETURN <SONG> $record/PLAY/SONG[2]/text() </SONG> </RESULT>
Composed XAT Tagger <RECORDLIST> $dataPlayTag</ RECORDLIST >$record 19 Combine$dataPlayTag Tagger <RESULT> $uDataSongTag</RESULT>$result 18 Tagger <PLAY> $dataSongTag</PLAY>$dataPlayTag 28 Combine$uDataSongTag 17 27 GroupBy$play 16 Tagger <SONG> $uSongData</SONG>$uDataSongTag Tagger<SONG>$sData</SONG>$dataSongTag Combine $dataSongTag 26 14 15 Navigate$uSong, text()$uSongData Navigate$song, SONG_PCDATA/text()$sData 25 13 Select$uNumPos=2 GroupBy$play 12 24 GroupBy$record, $uPlay OrderBy $sPos Navigate$song, POSITION/text()$sPos 22 11 10 Navigate$uPlay, SONG$uSong POS$uSong$uNumPos ThetaJoin $pIID=$sPID 23 21 9 Navigate$uRecord, PLAY$uPlay Navigate$song, PID/text()$sPID Navigate$play, IID/text()$pIID 5 20 8 Source “record.xml” $P Navigate$S, SONG/ROW$song OrderBy $pPos 4 7 User XAT $P=$record Source “dxv.xml” $S Navigate$play, POSITION/text()$pPos 6 3 Navigate $P, PLAY/ROW$play View XAT 2 Source “dxv.xml” $P 1
XAT Optimization – Order Explicit • Why? • Order in user XAT depends on the implicit order in the view • It blocks further optimization: Computation push down
XAT Optimization – Order Explicit Select$uNumPos=2 Pick second song GroupBy$record, $uPlay For each PLAY POS$uSong$uNumPos Sort SONGs User XAT Depend on View XAT construct PLAY Tagger <PLAY> $dataSongTag</PLAY>$dataPlayTag GroupBy$play construct SONG Tagger<SONG>$sData</SONG>$dataSongTag Combine $dataSongTag Cannot push down! Cannot translated into SQL!
XAT Optimization – Order Explicit • Goal: Convert user query order • FROM: implicit order in the XML view • TO: Explicit order-code column in relational encoding User Portion XAT User Portion XAT GroupBy$play GroupBy$record, $uPlay 22 22 POS$sPos$uNumPos POS$uSong$uNumPos 23 23 View Portion XAT View Portion XAT GroupBy$play GroupBy$play 12 12 Navigate$song, POSITION/text()$sPos OrderBy $sPos Navigate$song, POSITION/text()$sPos OrderBy $sPos 11 10 10 11 POS$uSong= POS$sPos
SQL-oriented XAT optimization • Goal: • Optimize XAT for efficient order-sensitive SQL generation • Rules: • Computation push-down • Push as much as possible to RDB • Order pull-up • Sort as late as possible • Avoid re-sorting !!! • Order-step rewrite • Match RDB order template
Tagger <RESULT> $uDataSongTag</RESULT>$result 28 Combine$uDataSongTag 27 Tagger <SONG> $sData</SONG>$uDataSongTag 26 Order pull up OrderBy $sPos, $pPos 4 Select$uNumPos=2 OrderStep rewrite 24 GroupBy$pPos OrderStep[$pPos], [$pPos, $sPos] $uNumPos 22 POS$sPos$uNumPos GroupBy$play 23 12 OrderBy $sPos ThetaJoin $pIID=$sPID 9 11 Navigate$play, IID/text()$pIID Navigate$song, POSITION/text()$sPos 5 10 Navigate$play, POSITION/text()$pPos Navigate$song, SONG_PCDATA/text()$sData Computation push down 13 3 Navigate $P, PLAY/ROW$play Navigate$song, PID/text()$sPID 2 8 Source “dxv.xml” $P Navigate$S, SONG/ROW$song 7 1 Source “dxv.xml” $S 6 Optimized XAT
Optimized XAT Tagger <RESULT> $uDataSongTag</RESULT>$result 28 Combine$uDataSongTag 27 Tagger <SONG> $sData</SONG>$uDataSongTag 26 OrderBy $sPos, $pPos 4 Select$uNumPos=2 24 OrderStep[$pPos], [$pPos, $sPos] $uNumPos 29 ThetaJoin $pIID=$sPID 9 Navigate$play, IID/text()$pIID Navigate$song, POSITION/text()$sPos 5 10 Navigate$play, POSITION/text()$pPos Navigate$song, SONG_PCDATA/text()$sData 13 3 Navigate $P, PLAY/ROW$play Navigate$song, PID/text()$sPID 2 8 Source “dxv.xml” $P Navigate$S, SONG/ROW$song 7 1 Source “dxv.xml” $S 6
Order Template • SQL-99 standard • Oracle, DB2 … Order Template TEMPLATE: SELECT row_number() over (<PARTITION>?<ORDERBY>) $pos_func_binding FROM <TABLE>+ PARTITION: partition by <ELEMENT> ORDERBY: order by <TONUMBER>|<ELEMENT> TONUMBER: to_number(<ELEMENT>) ELEMENT: element name TABLE: table name | TEMPLATE
Order-sensitive SQL generation • About push-down strategies • In general ---- push as much computation as possible into relational engine. • In order scenario --- tradeoff • Deep push: • Push OrderStep into Relational Engine • Relational engine has to support order template (SQL99) Tagger <RESULT> $uDataSongTag</RESULT>$result 28 Combine$uDataSongTag 27 Tagger <SONG> $sData</SONG>$uDataSongTag 26 SQL Q5$sData 32 Q5 = SELECT Q2.sData FROM (SELECT Q1.pPos, Q1.sPos, Q1.sData, row_number() OVER (PARTITION BY Q1.pPos ORDER BY Q1.sPos) uNumPos FROM (SELECT P.POSITION AS pPos, S.POSITION AS sPos, S.SONG_PCDATA AS sData FROM PLAY P, SONG S WHERE P.IID = S.PID ) Q1 ) Q2 WHERE Q2.uNumPos = 2 ORDER BY Q2.pPos, Q2.sPos
Order-sensitive SQL generation • Shallow push (otherwise) • leave OrderStep outside RDB • No requirement for Relational engine for supporting order template (SQL99) Tagger <RESULT> $uDataSongTag</RESULT>$result 28 Combine$uDataSongTag 27 Tagger <SONG> $sData</SONG>$uDataSongTag 26 OrderBy $sPos, $pPos 4 Select$uNumPos=2 24 OrderStep[$pPos], [$pPos, $sPos] $uNumPos 29 SQL Q1$sData SELECT P.POSITION AS pPos, S.POSITION AS sPos, S.SONG_PCDATA AS sData FROM PLAY P, SONG S WHERE P.IID = S.PID
Deep Push vs. Shallow Push • Low selectivitysimilar • High selectivity • Shallow push is better • Repeated sorting in deep push is expensive!
Experimental Study SQL Execution time --- Global vs. Local order encoding
Discussion: Further SQL optimization • General SQL optimization can be applied… • Cost-based SQL translation (SilkRoute) • Any other SQL optimization… • When order encoding is assumed… • SQL statements can be simplified by avoiding re-ordering • When relational database schema is aware … • Schema specific SQL optimization [KKN2002]
Related Work • XQuery-to-SQL translation systems: XPERANTO, SilkRoute, … • [TVB2002] I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and Querying Ordered XML Using a Relational Database System. In SIGMOD, 2002. • Three order encoding methods are utilized • Algorithms of translating ordered XPath expressions into SQL But … • [KKN2002] R. Krishnamurthy, R. Kaushik, and J. F. Naughton. Optimizing Fixed-Schema XML to SQL Query Translation. In VLDB, 2002.
Conclusion • Propose a general framework for order-sensitive XQuery-to-SQL translation (XSOT) • Propose order-sensitive XML algebra Tree (XAT) • SQL-oriented order-sensitive XAT optimization • Efficient order SQL statements generation and optimization techniques • Implementation using Rainbow query engine • Experiments to verify the generality and SQL performance
Rainbow XML Management System • Rainbow website: http://davis.wpi.edu/dsrg/rainbow/index.html • Software downloadhttp://davis.wpi.edu/dsrg/rainbow/RainbowCore/release.htm • Thank you!