130 likes | 247 Views
Interactive Dynamic Aggregate Queries. Kenneth A. Ross, Junyan Ding Columbia University. Research Experience. Complex query processing Data Warehousing Main memory databases. Students: Kazi Zaman, Junyan Ding. Outline. Past work on Datacubes Scenario Overview
E N D
Interactive Dynamic Aggregate Queries Kenneth A. Ross, Junyan Ding Columbia University DGRC Evaluation Board Meeting
Research Experience • Complex query processing • Data Warehousing • Main memory databases Students: Kazi Zaman, Junyan Ding DGRC Evaluation Board Meeting
Outline • Past work on Datacubes • Scenario Overview • Performance Goals • Technical Details • Conclusion DGRC Evaluation Board Meeting
Datacubes State Year Grade Sales State Year Grade Sales CA 1997 Regular 90 CA 1997 Regular 90 CA 1997 ALL 90 NY 1997 Premium 70 ALL 1997 Regular 90 CA 1998 Premium 65 CA ALL Regular 90 CUBE BY (sum Sales) ALL 1997 Regular 90 NY 1998 Premium 95 ALL 1997 ALL 160 ALL ALL Regular 90 CA ALL ALL 155 ALL ALL ALL 320 Additional records Large increase in total Size, especially with many dimensions ……. DGRC Evaluation Board Meeting
Level-2 Store Level-1 Store Finest granularity cuboid Query q records in linked lists Selected coarse records in hash table Slot directory Two-Level Framework DGRC Evaluation Board Meeting
Datacube Implementation Issues • Fast query response (2-4 ms/query), but • Rigid: • Need to know dimensions and aggregate functions in advance for precomputation. • Not appropriate when hundreds (PUMS) or thousands (NHANES) of dimensions are available. DGRC Evaluation Board Meeting
Scenario Outline User Web Data Request Dynamic Query Mediator ... Graphical User Interface UnifiedResults TraditionalDBMS Dynamic Query Engine Data Files e.g., PUMS DGRC Evaluation Board Meeting
Engine Decoupled from Interface • Can use a variety of interfaces • Multiple connections to one server • Can “do one thing well” • Client/Server parallelism • Abstract interaction via API DGRC Evaluation Board Meeting
Engine Performance Goals • Interactivedata exploration • Millions of records • Thousands of columns (but look at ten or so at a time) • Aggregates and statistical measures • Fine adjustments at 30 answers/second. DGRC Evaluation Board Meeting
Technical Details • Main Memory Implementation • Multidimensional tree structures • Cache consciousness • Branch Misprediction • SIMD • Asynchronous work DGRC Evaluation Board Meeting
Initial Results • Preliminary implementation • tens of answers/second on PUMS 1% data (>2 million records) • “Vanilla” user interface under construction DGRC Evaluation Board Meeting
Conclusions and Plan • First step towards dynamic aggregate queries • More general dependence on parameters • More ambitious user interfaces DGRC Evaluation Board Meeting
Plan • Integrate with user interface to generate dynamic queries. • Self-tuning capability. • Multiple data sets. • Work with agencies to generate value • For intra-agency analysis • For enhanced data dissemination DGRC Evaluation Board Meeting