1 / 11

Indexing OLAP Data Sunita Sarawagi

Indexing OLAP Data Sunita Sarawagi. Monowar Hossain York University. Agenda. Requirements on Indexing methods Existing indexing methods Optimization of R-Tree for OLAP data R-Tree VS Bit-mapped Indices Conclusion. Requirements on Indexing methods. Symmetric partial match queries

Download Presentation

Indexing OLAP Data Sunita Sarawagi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing OLAP DataSunita Sarawagi Monowar Hossain York University

  2. Agenda • Requirements on Indexing methods • Existing indexing methods • Optimization of R-Tree for OLAP data • R-Tree VS Bit-mapped Indices • Conclusion

  3. Requirements on Indexing methods • Symmetric partial match queries • Continuous e.g. “time between Jan to July 94” • Discontinuous e.g. “first month of each year” • Indexing at multiple levels of aggregation • Pre-computation group-bys • Indexing summary data • Handing multiple traversal orders • Efficient batch update • Handling sparse data efficiently

  4. Existing methods • Multidimensional array-based methods • Works efficiently when data is dense • Essbase’s schema • E.G. four dimensional cube : product and store (sparse), time and scenarios ( dense) • B-tree on Product and Store • Two-dimensional array on time and scenarios • Evaluation of Essbase’s schema • May cause multiple searches. • E.g. searching store = “something” on product-store index • Performance depends on ability to find enough dense dimensions. • Efficient batch update

  5. Existing methods… Cont... • Bit mapped indices • Pros: • Low cardinality data, bit maps are both spaced and retrieval efficient. • Supports bitwise operations • Access data is clustered • All dimensions handles symmetrically • Cons • Range queries • Increased space overhead of storing the bit-maps specially for high cardinality data • Expensive batch update as all bit mapped indices have to be modified even for a single row insertion

  6. Existing methods... Cont… • Bit-mapped indices variants • Compression • Hybrid • Dynamic Bit-maps

  7. Existing methods... Cont… • Hierarchical Indices • Example: Product - Store • Index product first also store summaries on product level. • For each product value, create index for Store and store summaries for product-store level • Pros: • Allows faster access to higher levels data • Dimensions are symmetrically handled • Cons: • Widely used index storage overhead • The average retrieval efficiency can suffer because of large indexing structure

  8. Existing methods… Cont… • Multidimensional indices • Use of of the indexed methods designed for spatial data • E.g RTree, GridFiles etc.

  9. Optimized R-Tree of OLAP data • Rectangular dense region (only the boundaries that contain more than threshold number of points • Contains a pointer to variable length array of (TIDs or the tuples itself) • Points in sparse regions • Finding dense regions • Ask Expert? • Use of clustering algorithm (similar algorithm: image analysis) • Need evaluation!!

  10. R-Tree VS Bit-mapped indices • R-Tree Pros: • Allows range queries • Smaller space overhead • Update is more efficient • Bit-mapped Pros: • Faster Bit-wise operation • Efficient for low cardinality, few restricted dimensions, and sparse data.

  11. Conclusion • High level overview • Recommended readings • MOLAP VS OLAP • R-Tree and variants • R-Tree alternatives • Computational of multidimensional aggregates • And More…..

More Related