130 likes | 303 Views
Cost Framework for a Heterogeneous Distributed Semi-structured Environment. Tianxiao Liu (1)(2) Tuyet-Tram Dang-Ngoc (1) Dominique Laurent (1). (1) ETIS Laboratory University of Cergy-Pontoise Cergy-Pontoise, France (2) Xcalia S.A., Paris, France. June 18 th , 2007.
E N D
Cost Framework for a Heterogeneous Distributed Semi-structured Environment Tianxiao Liu (1)(2) Tuyet-Tram Dang-Ngoc (1) Dominique Laurent (1) (1) ETIS Laboratory University of Cergy-Pontoise Cergy-Pontoise, France (2) Xcalia S.A., Paris, France June 18th, 2007 DBMAN 2007
Outline • Motivation • Cost models for heterogeneous data sources • Contributions • Generic language for cost communication • Dynamic cost estimation framework • Conclusion DBMAN 2007
Motivation • Cost-based query optimization • Various execution plans for the same query • Different costs for each plan (execution time, price, communication, etc.) • Cost model used to estimate the cost of candidate plans • Cost formulas: source oriented or operation oriented • Statistics of data sources • Problems in the case of mediation context • Data source autonomy: cost models not available • Integration of various cost models at mediator level • Cost communication between components of the system DBMAN 2007
Cost models for heterogeneous data sources Cost models based on operation implementation Generic cost models Specific methods Adapted Refined Calibration [DKS92] Sampling [ZL98] Cost model by history [ACP96] Operation [GP89] [ML86] [SA82] Adaptive [Zhu95] Extended Applied Calibration [GST96] Flora [Flo96] [Gru96] Operation [CD92] [BMG93] [DOA+94] Wrapper [HKWY97] [ROH99] Applied Hybrid cost model [NGT98] Access Path [GGT96] Operation [AAN01] [MW99] XQuery Self-Learning [ZHJGML05] Known sources Heterogeneous autonomous sources Relational Data sources Object oriented Data souces Semi-structured Data sources DBMAN 2007
Background XLive mediation system and its XQuery evaluation process … … Response Query Result (XML) Query XQuery Mediator Evaluation Equivalent rules Search Strategy Mediator Information Repository Canonization Canonized XQuery XAlgebra Cost-based Optimization Cost information Transformation Mediator operators Modeling Wrapper Information Repository Tree Graph View (TGV) Annotated TGV Annotation Wrapper operators Cost information Wrapper Wrapper Wrapper Relational data source XML data source Web services DBMAN 2007
BackgroundTree Graph View (TGV) An example of XQuery TGV presentation DBMAN 2007
Generic cost model in a mediation context • Design a generic cost model… • Source type: relational, semi-structured, web-service… • Specific methods • Calibration, History… • APIs implemented by the system • Principle: as accurate as possible • …Using cost formulas • Equation systems • Statistics expressed also in the form of equation • Constant values • Existing generic cost model (Disco) • Object Oriented environment • Predefined variables in the language DBMAN 2007
Our proposal: Generic Language for Cost Communication (GLCC) • A language based on XML • Cost formulas and equation systems in the form of MathML • A generic language • No predefined variables • Express different costs for various optimization objectives (time, price…) DBMAN 2007
Dynamic cost estimation framework • Cooperation and communication between different components of XLive • Use execution results (response time) to improve the accuracy of cost models • Cost communication performed in GLCC DBMAN 2007
Overall cost estimation on the mediatorTGV cost annotation • For one or a group of operations in a TGV, annotate with cost information Annotated DBMAN 2007
Overall cost estimation on the mediatorCost Annotation Tree (CAT) • Breadth-first traversal of CAT to associate the execution cost for each node DBMAN 2007
Conclusion and future work • Contributions • First cost-based query optimization framework for XML-based mediation system • Generic language • Suitable for various search strategies • Future work • Cost model validation: Accuracy and performance • Calibrating cost of native XML Data sources • Search Strategy DBMAN 2007
Thanks for your attention! Questions? DBMAN 2007