210 likes | 333 Views
Aggregation in Main Memory. Kenneth A. Ross Columbia University. Research Experience. Complex query processing Data Warehousing Main memory databases. Students: Kazi Zaman, Junyan Ding. Main- Memory DBMS. Query. Mediator. User. Unified Results. Traditional DBMS. Scenario A.
E N D
Aggregation in Main Memory Kenneth A. Ross Columbia University DGRC FedStats Visit
Research Experience • Complex query processing • Data Warehousing • Main memory databases Students: Kazi Zaman, Junyan Ding DGRC FedStats Visit
Main-MemoryDBMS Query Mediator User ... UnifiedResults TraditionalDBMS Scenario A DGRC FedStats Visit
Scenario B Web Data Request User Mediator ... UnifiedResults Sequence Of Interactive Queries TraditionalDBMS Main Memory DB DGRC FedStats Visit
Scenario C User Web Data Request Dynamic Query Mediator ... Graphical User Interface UnifiedResults TraditionalDBMS Main Memory DB DGRC FedStats Visit
Outline • Introduction to Datacubes • Frameworks for querying cubes • The Main Memory based framework • Experimental Results • Conclusions and Plan DGRC FedStats Visit
The CUBE BY Operator State Year Grade Sales State Year Grade Sales CA 1997 Regular 90 CA 1997 Regular 90 CA 1997 ALL 90 NY 1997 Premium 70 ALL 1997 Regular 90 CA 1998 Premium 65 CA ALL Regular 90 CUBE BY (sum Sales) ALL 1997 Regular 90 NY 1998 Premium 95 ALL 1997 ALL 160 ALL ALL Regular 90 CA ALL ALL 155 ALL ALL ALL 320 Additional records Large increase in total Size, especially with many dimensions ……. DGRC FedStats Visit
Lattice Representation Grade State Year Year, Grade State, Year State, Grade State, Year, Grade DGRC FedStats Visit
Modeling Queries Slice Queries ask for a single aggregate record SELECT State, year, sum(sales) FROM BLS-12345 GROUP BY State, year HAVING State = “NY” AND year = “1998” DGRC FedStats Visit
Existing Frameworks Choose subset of cube to materialize based on workload. State Year Grade Materialize on disk State, Year State,Grade Year,Grade Appropriate record recovered or computed for incoming slice query State, Year, Grade Drawbacks: Ignores Clustering of Relation on disk. Smallest unit of materialization is too big. DGRC FedStats Visit
Our approach The full cube is often larger than available memory, but ... State Year Grade The finest granularity aggregate may fit. State, Year State,Grade Year,Grade Any record can be computed without having to go to disk. State, Year, Grade How should the finest granularity be organized ? DGRC FedStats Visit
Level-2 Store Level-1 Store Finest granularity cuboid Query q records in linked lists Selected coarse records in hash table Slot directory Framework DGRC FedStats Visit
The Level-1 Store Records are <Key,Value> pairs stored in a hash table. Records can contain ALL’s Given query Q, form composite key and check level-1 store (constant time). If not found, use level-2 store Key Value a1 55 b2 34 c2 12 … ... DGRC FedStats Visit
The Level-2 Store Level-2 Store Finest granularity cuboid Slot directory is organized as a multidimensional array: level2[sz1][sz2][sz3][sz4] Each slot points to a linked list of elements. records in linked lists Records placed according to set of mapping functions H Slot directory DGRC FedStats Visit
Using the Level-2 store Query Q without ALL’s c2 d5 a3 b4 Slot 4 Slot 3 Slot 7 Slot1 Access list denoted by level2[4][3][7][1] ; aggregate those matching (a3,b4,c2,d5). DGRC FedStats Visit
Using the Level-2 store Query Q with ALL’s c2 a3 ALL ALL Slot 4 List of Slots Slot 7 List of Slots Access lists matching level2[4][*][7][*] ; aggregate those matching (a3,*,c2,*). DGRC FedStats Visit
Demo • Shows multidimensional dataset (subset of columns of 5% Census sample for NY in 1990). • User asks queries: fast answers. • Future: User Interface asks manyqueries, with display changing interactively. • demo DGRC FedStats Visit
Experimental Results Scanning all records takes 194 ms. DGRC FedStats Visit
Importance of Work • Aggregation is fundamental to analysis. • Make analysis interactive, even for many dimensions. • Make a variety of aggregate granularities available, where possible. DGRC FedStats Visit
Contributions • A Main Memory based framework for answering datacube queries efficiently. • Query Performance in the 2-4 ms range which is more efficient than going to disk. DGRC FedStats Visit
Plan • Integrate with user interface to generate dynamic queries. • Self-tuning capability. • Multiple data sets. • Work with agencies to generate value • For intra-agency analysis • For enhanced data dissemination DGRC FedStats Visit