230 likes | 332 Views
MISTRAL Performance of TPC-D Benchmark and Datawarehouses. Prof. R. Bayer, Ph.D. Dr. Volker Markl Dept. of Computer Science, Technical University Munich and Bavarian Research Center for Knowledgebased Systems (FORWISS). Test Bed for Performance Measurements. Hardware Compaq Proliant 5000
E N D
MISTRALPerformance of TPC-D Benchmark and Datawarehouses Prof. R. Bayer, Ph.D. Dr. Volker Markl Dept. of Computer Science, Technical University Munich and Bavarian Research Center for Knowledgebased Systems (FORWISS)
Test Bed for Performance Measurements • Hardware • Compaq Proliant 5000 • 4 Pentium II 200 MHz • 512 MB RAM • hard disk: 7 * 4 GB = 28 GB • Operating System • Windows NT 4.0 • RDBMS • Oracle 8 • 8kB pages • Access Methods • Tetris Algorithm for UB-Trees • Oracle IOT (clustering B*-Tree) • Oracle FTS (full table scan)
Shipping Priority Query (Q3) SELECT L_ORDERKEY, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE, O_ORDERDATE, O_SHIPPRIORITYFROM CUSTOMER, ORDER, LINEITEMWHERE C_MKTSEGMENT = 'FOOD' AND C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE < DATE 1.5.98 AND L_SHIPDATE > DATE 1.6.98GROUP BY L_ORDERKEY, O_ORDERDATE, O_SHIPPRIORITYORDER BY REVENUE DESC, O_ORDERDATE
Forecasting Revenue Change Query (Q6) SUM(L_EXTENDEDPRICE*L_DISCOUNT) AS REVENUEFROM LINEITEMWHERE L_SHIPDATE >= [date] AND L_SHIPDATE <= [date] + INVERVAL 1 YEAR ANDL_DISCOUNT BETWEEN [discount] -0.01 AND [discount] + 0.01 AND L_QUANTITY < [quantity]
Performance Measurements GFK • DBMS • TransBase (covering, clustering compound B*-Trees) • UB/API on top of TransBase (UB-Tree, two ESQL Statements are optimized and processed per UB-Tree page access) • TransBase Hypercube (UB-Tree inside the DBMS Kernel) • Database • real world data warehouse from GFK • 3D Snowflake Schema • Time (3 years = 18 MP) • Segment (10500 outlets) • Product (~ 500000 items in 604 product groups) • 42 Mio fact tuples (~ 4 GB fact table size) • Computer • Sun ULTRA 1 Workstation (64 MB Main Memory)
Indexes • MHC to encode hierarchies: • TIME_CS (5 bits) • SEGMENT_CS (24 bits) • PRODUCT_CS (29 bits) • Compound on (PRODUCT_CS, TIME_CS, SEGMENT_CS) or(TIME_CS, SEGMENT_CS, PRODUCT_CS) • UB-Tree (UB/API) on{TIME_CS, PRODUCT_CS, SEGMENT_CS}
GFK Datawarehouse Reports selectivity << 1%
Clustering of UB-Trees Ø = 0.85 s / dc clustering factor
Summary UB-Tree • Excellent performance on large real DBs, > factor 10 • Very low storage requirement • 1st answer extremely fast, interactive use!! • Response time proportional to size of answer • Wide applicability: all DBs are multidimensional!! • Easy integration into DBMS, simple DDL extension • Very useful as middleware • Patent applications