110 likes | 251 Views
Indexing OLAP Data Sunita Sarawagi. Monowar Hossain York University. Agenda. Requirements on Indexing methods Existing indexing methods Optimization of R-Tree for OLAP data R-Tree VS Bit-mapped Indices Conclusion. Requirements on Indexing methods. Symmetric partial match queries
E N D
Indexing OLAP DataSunita Sarawagi Monowar Hossain York University
Agenda • Requirements on Indexing methods • Existing indexing methods • Optimization of R-Tree for OLAP data • R-Tree VS Bit-mapped Indices • Conclusion
Requirements on Indexing methods • Symmetric partial match queries • Continuous e.g. “time between Jan to July 94” • Discontinuous e.g. “first month of each year” • Indexing at multiple levels of aggregation • Pre-computation group-bys • Indexing summary data • Handing multiple traversal orders • Efficient batch update • Handling sparse data efficiently
Existing methods • Multidimensional array-based methods • Works efficiently when data is dense • Essbase’s schema • E.G. four dimensional cube : product and store (sparse), time and scenarios ( dense) • B-tree on Product and Store • Two-dimensional array on time and scenarios • Evaluation of Essbase’s schema • May cause multiple searches. • E.g. searching store = “something” on product-store index • Performance depends on ability to find enough dense dimensions. • Efficient batch update
Existing methods… Cont... • Bit mapped indices • Pros: • Low cardinality data, bit maps are both spaced and retrieval efficient. • Supports bitwise operations • Access data is clustered • All dimensions handles symmetrically • Cons • Range queries • Increased space overhead of storing the bit-maps specially for high cardinality data • Expensive batch update as all bit mapped indices have to be modified even for a single row insertion
Existing methods... Cont… • Bit-mapped indices variants • Compression • Hybrid • Dynamic Bit-maps
Existing methods... Cont… • Hierarchical Indices • Example: Product - Store • Index product first also store summaries on product level. • For each product value, create index for Store and store summaries for product-store level • Pros: • Allows faster access to higher levels data • Dimensions are symmetrically handled • Cons: • Widely used index storage overhead • The average retrieval efficiency can suffer because of large indexing structure
Existing methods… Cont… • Multidimensional indices • Use of of the indexed methods designed for spatial data • E.g RTree, GridFiles etc.
Optimized R-Tree of OLAP data • Rectangular dense region (only the boundaries that contain more than threshold number of points • Contains a pointer to variable length array of (TIDs or the tuples itself) • Points in sparse regions • Finding dense regions • Ask Expert? • Use of clustering algorithm (similar algorithm: image analysis) • Need evaluation!!
R-Tree VS Bit-mapped indices • R-Tree Pros: • Allows range queries • Smaller space overhead • Update is more efficient • Bit-mapped Pros: • Faster Bit-wise operation • Efficient for low cardinality, few restricted dimensions, and sparse data.
Conclusion • High level overview • Recommended readings • MOLAP VS OLAP • R-Tree and variants • R-Tree alternatives • Computational of multidimensional aggregates • And More…..