200 likes | 346 Views
Introduction. XML: an emerging standard for exchanging data on the WWW. Relational database: most wildly used DBMS. Goal: how to map the relational data in the XML documents. Introduction (cont’d). “Efficiently Publishing Relational Data as XML Documents”
E N D
Introduction • XML: an emerging standard for exchanging data on the WWW. • Relational database: most wildly used DBMS. • Goal: how to map the relational data in the XML documents
Introduction (cont’d) • “Efficiently Publishing Relational Data as XML Documents” • Lei Jiang: a language for conversion, implementations • Yan Zhang: implementations • Yong Zhuge: performance comparison
Language • SQL with minor scalar and aggregate function extensions for XML construction • Advantage: use existing APIs and processing infrastructure of RDBMS • Other language proposals • Example
<customer id=“C1”> <name>John Doe </name> <accounts> <account id=“A1”>1894654</account> <account id=“A2”>3849342</account> <porders> <porder id=“PO1” acct=“A1”> //first purchase order <date>1 Jan 2000</date> <items> <item id=“I1”>Shoes</item> <item id=“I2”>Bungee Ropes</item> </items> <payments> <payment id=“P1”>due January 15 </payment> <payment id=“P2”>due January 20 </payment> <payment id=“P3”>due February 15 </payment> </payments> </porder> <porder id= “PO2” acct= “A2”> //second purchase order … </porder> </porders> </customer>
Customer(id integer, name varchar(20) Account(id varchar(20), cusId integer, acctnum integer) PurchOrder(id integer, cusid integer, acctId varchar(20) date varchar(10) Item(id integer, poId inteter, desc varchar(10) Payment(id integer, poId integer, desc varchar(10)
Select cust.name, CUST(cust.id, cust.name, • (Select XMLAGG(ACCT(acct.id, acct.acctnum) • From Account acct • Where acct.custId = cust.id), • (Select XMLAGG(PORDER(porder.id, porder.acct, porder.date, • (Select XMLAGG(ITEM(item.id, item.desc)) • From Item item • Where item.poId=porder.id), • (Select XMLAGG(PAYMENT(pay.id,pay.desc)) • From Payment pay • Where pay.poId = porder.id))) • From PurchOrder porder • Where porder.custId=cust.id)) • From Customer cust
Define XML Constructor CUST (custId: integer, custName: varchar(20) acctList: xml, porderList:xml) AS{ <customer id=$custId> <name>$custName</name> <accounts>$acctList</accounts> <porders>$porderList</porders> </customer> }
Implementation • Add tags and structure to the relational tables • Early Tagging, Early Structuring • Late Tagging, Late Structuring • Early Tagging, Late Structuring • Outside Engine, Inside Engine
Early Tagging, Early Structuring • Outside engine: Stored Procedure Approach Simplest technique, commonly used Drawback: overhead of issuing many queries • Inside engine: Correlated CLOB, De-Correlated CLOB Approach
Late Tagging, Late Structuring • Content creation • Relational data is produced • Tagging and structuring • Relational data is structured and tagged to produce XML document
Content Creation • Redundant Relation Approach • Join every table together • Simple • Redundancy • Unsorted Outer Union Approach • Compute each path using join • One tuple per data item in the leaf level • Sub-expressions are shared to reduce redundancy
Content Creation (cont’)(Unsorted Outer Union Approach) Outer Union (CustId,CustInfo,POId, POInfo,ItemId,ItemInfo) (CustId,CustInfo,POId,POInfo,PaymentId,PaymentInfo) Right Outer Join Left Outer Join Item Payment (CustId,CustInfo,AcctId, AcctInfo) (CustId,CustInfo,POId, POInfo) Right Outer Join Left Outer Join PurchaseOrder Account Customer
Structuring & Tagging(Hash-based Tagger) • Two things need to do • Group all siblings in the desired XML document under the same parent • In order to recognize siblings, we need to look for the same parent • Using main-memory hash table to do this(given the parent’s type and id information) • Extract the information from each tuple and tag it to produce the XML result • This will be done after all the input tuples have been hashed • The output process is straightforward
Late Tagging, Early Structuring • Why? • Late tagging and Late structuring need complex memory management • We can use “structured content” and “constant space tagger” to eliminate this problem • Structured content creation(Sorted Outer Union) • The key is to order the relational content the same way that it needs to appear in the result XML document • Two important factors need to be satisfied • Parent information occurs before, or with, child information • Information about a particular node and its descendants is not mixed in with information about non-descendant nodes.
Late Tagging, Early Structuring(cont’) • Performing a single final relational sort of the unstructured relational content is sufficient • Null value will be sorted first • Parents always are sorted before the children • Parent’s id occurs before child’s id, which ensure the children of a parent node are grouped together • Tagging Sorted Data • Easy • Tuples have been in order • Add tags and write out
Performance Comparison of Alternatives for publishing XML • The Parameters in our experiment • 1) query fan out • 2) query depth • 3) Number of roots. • 4) Number of leaf tuples • ( Only balanced queries are considered in our experiment. )
Performance Comparison of Alternatives for publishing XML Parameter Settings for Experiment
Summary and Conclusion • This paper introduced, implemented and tested a mechanism for converting relational data • to XML Document. Different approaches are tested, include Stored Proc, CLOB-Corr, CLOB- • DeCorr, Unsorted OU(In/Out), Sorted OU(In/Out). It points to the following conclusions, • Constructing an XML document inside the relational engine is far more efficient than doing so outside the engine, mainly because of the high cost of binding out tuples to host variables. • When processing can be done in main memory, a stable approach that is always among the very best (both inside and outside the engine), is the Unsorted Outer Union approach. • When processing cannot be done in main memory, the Sorted Outer Union approach is the approach of choice (both inside and outside the engine). This is because the relational sort operator scales well.