1 / 21

Aggregation in Main Memory

Aggregation in Main Memory. Kenneth A. Ross Columbia University. Research Experience. Complex query processing Data Warehousing Main memory databases. Students: Kazi Zaman, Junyan Ding. Main- Memory DBMS. Query. Mediator. User. Unified Results. Traditional DBMS. Scenario A.

Download Presentation

Aggregation in Main Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aggregation in Main Memory Kenneth A. Ross Columbia University DGRC FedStats Visit

  2. Research Experience • Complex query processing • Data Warehousing • Main memory databases Students: Kazi Zaman, Junyan Ding DGRC FedStats Visit

  3. Main-MemoryDBMS Query Mediator User ... UnifiedResults TraditionalDBMS Scenario A DGRC FedStats Visit

  4. Scenario B Web Data Request User Mediator ... UnifiedResults Sequence Of Interactive Queries TraditionalDBMS Main Memory DB DGRC FedStats Visit

  5. Scenario C User Web Data Request Dynamic Query Mediator ... Graphical User Interface UnifiedResults TraditionalDBMS Main Memory DB DGRC FedStats Visit

  6. Outline • Introduction to Datacubes • Frameworks for querying cubes • The Main Memory based framework • Experimental Results • Conclusions and Plan DGRC FedStats Visit

  7. The CUBE BY Operator State Year Grade Sales State Year Grade Sales CA 1997 Regular 90 CA 1997 Regular 90 CA 1997 ALL 90 NY 1997 Premium 70 ALL 1997 Regular 90 CA 1998 Premium 65 CA ALL Regular 90 CUBE BY (sum Sales) ALL 1997 Regular 90 NY 1998 Premium 95 ALL 1997 ALL 160 ALL ALL Regular 90 CA ALL ALL 155 ALL ALL ALL 320 Additional records Large increase in total Size, especially with many dimensions ……. DGRC FedStats Visit

  8. Lattice Representation Grade State Year Year, Grade State, Year State, Grade State, Year, Grade DGRC FedStats Visit

  9. Modeling Queries Slice Queries ask for a single aggregate record SELECT State, year, sum(sales) FROM BLS-12345 GROUP BY State, year HAVING State = “NY” AND year = “1998” DGRC FedStats Visit

  10. Existing Frameworks Choose subset of cube to materialize based on workload. State Year Grade Materialize on disk State, Year State,Grade Year,Grade Appropriate record recovered or computed for incoming slice query State, Year, Grade Drawbacks: Ignores Clustering of Relation on disk. Smallest unit of materialization is too big. DGRC FedStats Visit

  11. Our approach The full cube is often larger than available memory, but ... State Year Grade The finest granularity aggregate may fit. State, Year State,Grade Year,Grade Any record can be computed without having to go to disk. State, Year, Grade How should the finest granularity be organized ? DGRC FedStats Visit

  12. Level-2 Store Level-1 Store Finest granularity cuboid Query q records in linked lists Selected coarse records in hash table Slot directory Framework DGRC FedStats Visit

  13. The Level-1 Store Records are <Key,Value> pairs stored in a hash table. Records can contain ALL’s Given query Q, form composite key and check level-1 store (constant time). If not found, use level-2 store Key Value a1 55 b2 34 c2 12 … ... DGRC FedStats Visit

  14. The Level-2 Store Level-2 Store Finest granularity cuboid Slot directory is organized as a multidimensional array: level2[sz1][sz2][sz3][sz4] Each slot points to a linked list of elements. records in linked lists Records placed according to set of mapping functions H Slot directory DGRC FedStats Visit

  15. Using the Level-2 store Query Q without ALL’s c2 d5 a3 b4 Slot 4 Slot 3 Slot 7 Slot1 Access list denoted by level2[4][3][7][1] ; aggregate those matching (a3,b4,c2,d5). DGRC FedStats Visit

  16. Using the Level-2 store Query Q with ALL’s c2 a3 ALL ALL Slot 4 List of Slots Slot 7 List of Slots Access lists matching level2[4][*][7][*] ; aggregate those matching (a3,*,c2,*). DGRC FedStats Visit

  17. Demo • Shows multidimensional dataset (subset of columns of 5% Census sample for NY in 1990). • User asks queries: fast answers. • Future: User Interface asks manyqueries, with display changing interactively. • demo DGRC FedStats Visit

  18. Experimental Results Scanning all records takes 194 ms. DGRC FedStats Visit

  19. Importance of Work • Aggregation is fundamental to analysis. • Make analysis interactive, even for many dimensions. • Make a variety of aggregate granularities available, where possible. DGRC FedStats Visit

  20. Contributions • A Main Memory based framework for answering datacube queries efficiently. • Query Performance in the 2-4 ms range which is more efficient than going to disk. DGRC FedStats Visit

  21. Plan • Integrate with user interface to generate dynamic queries. • Self-tuning capability. • Multiple data sets. • Work with agencies to generate value • For intra-agency analysis • For enhanced data dissemination DGRC FedStats Visit

More Related