1 / 21

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra. Xuepeng Yin and Torben B. Pedersen. Department of Computer Science Aalborg University. Problem. OLAP-systems are good for complex analysis queries Easy-to-use Fast Business, science ...

wanda
Download Presentation

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University

  2. Problem • OLAP-systems are good for complex analysis queries • Easy-to-use • Fast • Business, science ... • Problems with physical integration in existing OLAP systems • Integrating new data requires (partial) cube rebuild => too slow • Problems arise with dynamic data • Stock quotes, competitors prices, disease info... • Data will often be available in Extended Markup Language (XML) format • Weather data, map info, price lists, ……

  3. OLAP/XML query Logical federation OLAP query XML query <?xml version=”1.0” ?> <?xml version=”1.0” ?> <?xml version=”1.0” ?> <?xml version=”1.0” ?> OLAP XML Solution • Allows the use of external XML data as virtual dimensions • Decoration (extra info) • Type information. • Selection • Condition on XML data • Grouping • Categories by XML data • Goal: flexible access to XML data from OLAP systems

  4. Overview • Contributions • Architecture of the federation • Linking OLAP and XML • The federation query semantics • The logical algebra • The physical algebra • Conversion from logical to physical plans • Plan execution • Query optimization • The query optimizer • Execution of an optimized plan • Performance • Conclusion

  5. Contributions of This Paper • Previous OLAP-XML federation efforts • A logical algebra • A partial, straight-forward implementation • Problems with previous work • The logical algebra does not accurately reflect query execution tasks • Query optimization is based on an abstract level • Implementation is very limited • Novelties of this paper • A physical algebra and simplified query semantics • Practical query optimization techniques • A full-function, robust query engine • Experiments with the query engine

  6. Architecture of the federation • OLAP and XML components • Auxiliary components • Query engine • Query analyzer • Query optimizer • Query evaluator

  7. <Nations> <Nation> <NationName>Denmark< / NationName > <Population>5.3</ Population> </ Nation> </ Nations> Time Orders Suppliers EC Nlink Year Customer Man. Region Quarter Brand Nation Month Order Part Supplier Quantity Linking OLAP and XML • Links • Relation between a set of dimension values and a set of XML nodes • Level expressions • <level>/<link>/<XPath expression> specifies a concrete link usage • Nation/Nlink/Population links nations to populations Nlink={(DK, n1), (CN, n2), (UK, n3)}

  8. The Federation Query Semantics • The logical algebra • Decoration, • Federation selection, • Federation Generalized projection, • The federation query language: SQLXM SELECT SUM(Quantities), Brand(Part), Nation/Nlink/Population FROM TC WHERE Nation/Nlink/Population<30 GROUP BY Brand(Part), Nation/Nlink/Population

  9. The Physical Algebra • Includes data retrieval and manipulation operators • A physical plan models real execution tasks • i.e., when, where and how data is processed • Nine physical operators • Querying the OLAP component • Cube selection and generalized projection • Data transfer between components • Fact-, dimension- and XML- transfer operators • Temporary data manipulations • Decoration, federation selection and generalized projection • Inlining XML data • Inlining

  10. Querying the OLAP Component • Cube selection • Has no references to XML data • Performs selection over the OLAP cube • Intuitively, a SQL SELECT statement • Cube generalized projection • Has no references to XML data • Rolls up dimensions and aggregate specified measures at specified levels • Intuitively, a SQL SELECT statement with a GROUP BY clause

  11. Data Transfer Between Components • Fact-transfer • Transfers the OLAP fact data to the temporary component • The temporary facts then can be decorated • Intuitively, a SQL SELECT INTO statement • Dimension-transfer • Transfers dimension data to the temporary component • Used when higher level dimension data is required in the temporary component • XML-transfer • Transfers XML data to the temporary component • Uses XPath expressions to identify XML nodes with decoration values

  12. Temporary Data Manipulations • Decoration • Decorates the cube by adding a new dimension • Intuitively, adds a table with dimension and decoration XML data • SELECT * FROM t(supplier, nation) t1, t(nation, population) t2 WHERE t1.nation =t2.nation • Federation selection • Performs selection over the cube in the temporary component • Intuitively, a SQL selection over the temporary tables • SELECT t1.* FROM tfact t1, t(supplier, population) t2 WHERE t1.supplier =t2.supplier and population<30 • Federation generalized projection • Rolls up and aggregates the cube in the temporary component • Intuitively, a SQL selection with a GROUP BY clause • SELECT SUM(Quantity), t2.population FROM tfact t1, t(supplier, population) t2 WHERE t1.supplier= t2.supplier GROUP BY t2.population

  13. Nation=‘DK’ OR Nation=‘UK’ Inlining XML Data • Denoted as • Comparing federated data in the temporary component is expensive • Inlining refers to integrating XML data into the OLAP selections • A resulting predicate • Only references dimension levels and constants • Can be evaluated in the OLAP component Nation/Nlink/Population<30 +

  14. From Logical to Physical Plans

  15. Supplier Population Supplier Population Quantity ExtPrice Supplier Part Order Day Supplier Nation S1 5.3 S1 5.3 17 17954 S1 P3 11 2/12/96 Nation Population S1 DK S2 5.3 S2 5.3 28 29983 S2 P4 42 30/3/94 DK 5.3 S2 DK S3 1264.5 S3 1264.5 2 2388 S3 P3 4 8/12/96 Quantity ExtPrice Population Part Order Day CN 1264.5 S3 CN S4 19.1 S4 19.1 26 26374 S4 P2 20 10/11/93 17 17954 5.3 P3 11 2/12/96 UK 19.1 S4 UK 28 29983 5.3 P4 42 30/3/94 26 26374 19.1 P2 20 10/11/93 Plan Execution

  16. The Query Optimizer • Based on the Volcano optimizer • Four phases optimization at one stage • Logical equivalent plan enumeration • One-to-one logical to physical conversion • Estimating cost of physical plans: • Cost-based plan space pruning

  17. An Optimized Query Plan

  18. Quantity ExtPrice Supplier Part Order Day 17 17954 S1 P3 11 2/12/96 28 29983 S2 P4 42 30/3/94 26 26374 S4 P2 20 10/11/93 Nation Population Quantity Quantity Nation Nation Brand Brand DK 5.3 17 17 DK DK B3 B3 CN 1264.5 28 28 DK DK B4 B4 UK 19.1 26 26 UK UK B2 B2 Execution of the Optimized Plan

  19. Performance • One experiment compared: • a. Our federated solution • b. Physical integration • c. Federating cached XML data • Data • 100M fact data based on TPC-H benchmark • 11MB and 2KB XML data • Queries • Result: • Comparable to b for small amounts of data • Use c for large amounts of data

  20. Related Work • Generic data integration • Relational, XML, semi-structured, OO,… + combinations • Do not consider OLAP DB properties such as automatic aggregation, dimension hierarchies and correct aggregation • OLAP-object federations • Current solution offers much more general use of external data • Current solution not restricted to rigid object schemas • Current solution allows irregular data • Previous OLAP-XML federation efforts • A logical algebra • A partial, straight-forward implementation

  21. Conclusion • OLAP handles schema changes and dynamic data poorly • Solutions • Logical federation of OLAP and XML • A physical algebra models actual execution tasks • Optimized query evaluation • Experiments suggest feasibility • Future work • More optimization techniques • Advanced evaluation techniques • Co-operative development with OLAP query tool vendor

More Related