210 likes | 364 Views
Evaluating XML-Extended OLAP Queries Based on a Physical Algebra. Xuepeng Yin and Torben B. Pedersen. Department of Computer Science Aalborg University. Problem. OLAP-systems are good for complex analysis queries Easy-to-use Fast Business, science ...
E N D
Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University
Problem • OLAP-systems are good for complex analysis queries • Easy-to-use • Fast • Business, science ... • Problems with physical integration in existing OLAP systems • Integrating new data requires (partial) cube rebuild => too slow • Problems arise with dynamic data • Stock quotes, competitors prices, disease info... • Data will often be available in Extended Markup Language (XML) format • Weather data, map info, price lists, ……
OLAP/XML query Logical federation OLAP query XML query <?xml version=”1.0” ?> <?xml version=”1.0” ?> <?xml version=”1.0” ?> <?xml version=”1.0” ?> OLAP XML Solution • Allows the use of external XML data as virtual dimensions • Decoration (extra info) • Type information. • Selection • Condition on XML data • Grouping • Categories by XML data • Goal: flexible access to XML data from OLAP systems
Overview • Contributions • Architecture of the federation • Linking OLAP and XML • The federation query semantics • The logical algebra • The physical algebra • Conversion from logical to physical plans • Plan execution • Query optimization • The query optimizer • Execution of an optimized plan • Performance • Conclusion
Contributions of This Paper • Previous OLAP-XML federation efforts • A logical algebra • A partial, straight-forward implementation • Problems with previous work • The logical algebra does not accurately reflect query execution tasks • Query optimization is based on an abstract level • Implementation is very limited • Novelties of this paper • A physical algebra and simplified query semantics • Practical query optimization techniques • A full-function, robust query engine • Experiments with the query engine
Architecture of the federation • OLAP and XML components • Auxiliary components • Query engine • Query analyzer • Query optimizer • Query evaluator
<Nations> <Nation> <NationName>Denmark< / NationName > <Population>5.3</ Population> </ Nation> </ Nations> Time Orders Suppliers EC Nlink Year Customer Man. Region Quarter Brand Nation Month Order Part Supplier Quantity Linking OLAP and XML • Links • Relation between a set of dimension values and a set of XML nodes • Level expressions • <level>/<link>/<XPath expression> specifies a concrete link usage • Nation/Nlink/Population links nations to populations Nlink={(DK, n1), (CN, n2), (UK, n3)}
The Federation Query Semantics • The logical algebra • Decoration, • Federation selection, • Federation Generalized projection, • The federation query language: SQLXM SELECT SUM(Quantities), Brand(Part), Nation/Nlink/Population FROM TC WHERE Nation/Nlink/Population<30 GROUP BY Brand(Part), Nation/Nlink/Population
The Physical Algebra • Includes data retrieval and manipulation operators • A physical plan models real execution tasks • i.e., when, where and how data is processed • Nine physical operators • Querying the OLAP component • Cube selection and generalized projection • Data transfer between components • Fact-, dimension- and XML- transfer operators • Temporary data manipulations • Decoration, federation selection and generalized projection • Inlining XML data • Inlining
Querying the OLAP Component • Cube selection • Has no references to XML data • Performs selection over the OLAP cube • Intuitively, a SQL SELECT statement • Cube generalized projection • Has no references to XML data • Rolls up dimensions and aggregate specified measures at specified levels • Intuitively, a SQL SELECT statement with a GROUP BY clause
Data Transfer Between Components • Fact-transfer • Transfers the OLAP fact data to the temporary component • The temporary facts then can be decorated • Intuitively, a SQL SELECT INTO statement • Dimension-transfer • Transfers dimension data to the temporary component • Used when higher level dimension data is required in the temporary component • XML-transfer • Transfers XML data to the temporary component • Uses XPath expressions to identify XML nodes with decoration values
Temporary Data Manipulations • Decoration • Decorates the cube by adding a new dimension • Intuitively, adds a table with dimension and decoration XML data • SELECT * FROM t(supplier, nation) t1, t(nation, population) t2 WHERE t1.nation =t2.nation • Federation selection • Performs selection over the cube in the temporary component • Intuitively, a SQL selection over the temporary tables • SELECT t1.* FROM tfact t1, t(supplier, population) t2 WHERE t1.supplier =t2.supplier and population<30 • Federation generalized projection • Rolls up and aggregates the cube in the temporary component • Intuitively, a SQL selection with a GROUP BY clause • SELECT SUM(Quantity), t2.population FROM tfact t1, t(supplier, population) t2 WHERE t1.supplier= t2.supplier GROUP BY t2.population
Nation=‘DK’ OR Nation=‘UK’ Inlining XML Data • Denoted as • Comparing federated data in the temporary component is expensive • Inlining refers to integrating XML data into the OLAP selections • A resulting predicate • Only references dimension levels and constants • Can be evaluated in the OLAP component Nation/Nlink/Population<30 +
Supplier Population Supplier Population Quantity ExtPrice Supplier Part Order Day Supplier Nation S1 5.3 S1 5.3 17 17954 S1 P3 11 2/12/96 Nation Population S1 DK S2 5.3 S2 5.3 28 29983 S2 P4 42 30/3/94 DK 5.3 S2 DK S3 1264.5 S3 1264.5 2 2388 S3 P3 4 8/12/96 Quantity ExtPrice Population Part Order Day CN 1264.5 S3 CN S4 19.1 S4 19.1 26 26374 S4 P2 20 10/11/93 17 17954 5.3 P3 11 2/12/96 UK 19.1 S4 UK 28 29983 5.3 P4 42 30/3/94 26 26374 19.1 P2 20 10/11/93 Plan Execution
The Query Optimizer • Based on the Volcano optimizer • Four phases optimization at one stage • Logical equivalent plan enumeration • One-to-one logical to physical conversion • Estimating cost of physical plans: • Cost-based plan space pruning
Quantity ExtPrice Supplier Part Order Day 17 17954 S1 P3 11 2/12/96 28 29983 S2 P4 42 30/3/94 26 26374 S4 P2 20 10/11/93 Nation Population Quantity Quantity Nation Nation Brand Brand DK 5.3 17 17 DK DK B3 B3 CN 1264.5 28 28 DK DK B4 B4 UK 19.1 26 26 UK UK B2 B2 Execution of the Optimized Plan
Performance • One experiment compared: • a. Our federated solution • b. Physical integration • c. Federating cached XML data • Data • 100M fact data based on TPC-H benchmark • 11MB and 2KB XML data • Queries • Result: • Comparable to b for small amounts of data • Use c for large amounts of data
Related Work • Generic data integration • Relational, XML, semi-structured, OO,… + combinations • Do not consider OLAP DB properties such as automatic aggregation, dimension hierarchies and correct aggregation • OLAP-object federations • Current solution offers much more general use of external data • Current solution not restricted to rigid object schemas • Current solution allows irregular data • Previous OLAP-XML federation efforts • A logical algebra • A partial, straight-forward implementation
Conclusion • OLAP handles schema changes and dynamic data poorly • Solutions • Logical federation of OLAP and XML • A physical algebra models actual execution tasks • Optimized query evaluation • Experiments suggest feasibility • Future work • More optimization techniques • Advanced evaluation techniques • Co-operative development with OLAP query tool vendor