1 / 23

MISTRAL Performance of TPC-D Benchmark and Datawarehouses

This study evaluates the performance of TPC-D benchmark and datawarehouses using Tetris algorithm and UB-Tree clustering methods. Tests were conducted on Compaq Proliant 5000 with Oracle RDBMS.

kayd
Download Presentation

MISTRAL Performance of TPC-D Benchmark and Datawarehouses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MISTRALPerformance of TPC-D Benchmark and Datawarehouses Prof. R. Bayer, Ph.D. Dr. Volker Markl Dept. of Computer Science, Technical University Munich and Bavarian Research Center for Knowledgebased Systems (FORWISS)

  2. Test Bed for Performance Measurements • Hardware • Compaq Proliant 5000 • 4 Pentium II 200 MHz • 512 MB RAM • hard disk: 7 * 4 GB = 28 GB • Operating System • Windows NT 4.0 • RDBMS • Oracle 8 • 8kB pages • Access Methods • Tetris Algorithm for UB-Trees • Oracle IOT (clustering B*-Tree) • Oracle FTS (full table scan)

  3. TPC-D Schema

  4. Shipping Priority Query (Q3) SELECT L_ORDERKEY, SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS REVENUE, O_ORDERDATE, O_SHIPPRIORITYFROM CUSTOMER, ORDER, LINEITEMWHERE C_MKTSEGMENT = 'FOOD' AND C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE < DATE 1.5.98 AND L_SHIPDATE > DATE 1.6.98GROUP BY L_ORDERKEY, O_ORDERDATE, O_SHIPPRIORITYORDER BY REVENUE DESC, O_ORDERDATE

  5. Tetris algorithm Q3

  6. Response times 50% LINEITEM (Q3)

  7. Temporary Storage 50% LINEITEM (Q3)

  8. Sorting 50% of LINEITEM

  9. Forecasting Revenue Change Query (Q6) SUM(L_EXTENDEDPRICE*L_DISCOUNT) AS REVENUEFROM LINEITEMWHERE L_SHIPDATE >= [date] AND L_SHIPDATE <= [date] + INVERVAL 1 YEAR ANDL_DISCOUNT BETWEEN [discount] -0.01 AND [discount] + 0.01 AND L_QUANTITY < [quantity]

  10. Forecasting Revenue Change Query (Q6)

  11. Performance of Q6

  12. Retrieving 3,3% of LINEITEM

  13. GFKSnowflake Schema

  14. TETRIS & MHC

  15. Performance Measurements GFK • DBMS • TransBase (covering, clustering compound B*-Trees) • UB/API on top of TransBase (UB-Tree, two ESQL Statements are optimized and processed per UB-Tree page access) • TransBase Hypercube (UB-Tree inside the DBMS Kernel) • Database • real world data warehouse from GFK • 3D Snowflake Schema • Time (3 years = 18 MP) • Segment (10500 outlets) • Product (~ 500000 items in 604 product groups) • 42 Mio fact tuples (~ 4 GB fact table size) • Computer • Sun ULTRA 1 Workstation (64 MB Main Memory)

  16. Indexes • MHC to encode hierarchies: • TIME_CS (5 bits) • SEGMENT_CS (24 bits) • PRODUCT_CS (29 bits) • Compound on (PRODUCT_CS, TIME_CS, SEGMENT_CS) or(TIME_CS, SEGMENT_CS, PRODUCT_CS) • UB-Tree (UB/API) on{TIME_CS, PRODUCT_CS, SEGMENT_CS}

  17. GFK Datawarehouse Reports  selectivity << 1%

  18. Compound: fixed 2MP, varying PG

  19. UB-Tree: fixed 2MP, varying PG

  20. Response Time & Result Set Size

  21. Clustering of UB-Trees Ø = 0.85 s / dc clustering factor

  22. Clustering depending on Result Set Size

  23. Summary UB-Tree • Excellent performance on large real DBs, > factor 10 • Very low storage requirement • 1st answer extremely fast, interactive use!! • Response time proportional to size of answer • Wide applicability: all DBs are multidimensional!! • Easy integration into DBMS, simple DDL extension • Very useful as middleware • Patent applications

More Related