240 likes | 264 Views
Improved Query Performance With Variant Indexes. Patrick O’Neil, Dallan Quass. Presented by Bo Han. Motivation. 1. Speed up the queries on data warehoues data warehoues are large read-mostly always perform queries of aggregate, filter, and group the data. Motivation.
E N D
Improved Query Performance With Variant Indexes Patrick O’Neil, Dallan Quass Presented by Bo Han
Motivation 1. Speed up the queries on data warehoues • data warehoues are large • read-mostly • always perform queries of aggregate, filter, and group the data
Motivation 2. The first rigorous examination of variant indexes in the literature • Advantages over traditional Value-List indexes for certain classes of queries • More than one type of index available on a column
Motivation 3. Introducing a new indexing approach to support OLAP-type queries • Datacube • Multi-dimensional query • Depends on summary tables
Value-List Index (B+ tree) Leaf Node RID Problem: A key values will have large number of associated RID’s!
Bitmap Indexes Bitmap for Brand Dell 1001000 HP 0100101 Sony 0010000 IBM 0000010 Bitmap for Size 12 0000011 13 0100000 14 1001100 15 0010000 • A Bitmap for a value: an array of bits. The ith bit is set to 1 • if the ith record has the value • A Bitmap index: consists of one bitmap for each value that attribute can take • A Bitmap is an alternate method of representing RID-lists in a Value-List index (low-cardinality)
Bitmap Indexes 1. More space efficient than RID lists in a Value-List index • No compression • |RID|=32bits, #row=n, #distinct value=m • If m<32 m*n<32*n • Compression: Run-length encoding • 2. More CPU efficient for may functions • Boolean operations • ex1: Select Brand From Product • Where Brand=‘HP’ and Size=13 (AND) • ex2: Select Pid From Product • Where Size>12 and Size<15 (OR)
Bitmap Indexes • 2. More CPU efficient for may functions • Count • Select count(*) From Product • Where Brand=‘Dell’ and Size>14 • Each individual bitmap is small and frequently used ones can be cached in memory • 4. Available in most major commercial DBMS
Projection Index A projection index for column duplicates all column values for lookup by ordinal number . • Easy to locate • N=1000*p+s (p: page#, s: slot#) • Few disk I/O projection index for col2
Bit-Sliced Index A set of bitmap slices which are orthogonal to the data held in a projection index. (i.e. a bitwise vertical partition) B5 bit-slice B4 B3 B2 B1 B0 Col2: 20 52 20 62 10 34 1 49 Bnn: bitmap representing set of non null values in the indexed column
Comparison of Indexes (evaluating Single-Column Sum Aggregates) Select SUM(doloar_sales) From Sales Where condition Analyze the disk page I/O cost Plan 1 : Direct access to the rows to calculate the Sum 100million rows, Len(row)=200B, |page|=4K 20rows/page, |Foundset|=2million rows Plan 2 : Calculating Sum through a Projection Index Len(doloar_sales)=4B, 1000rows/page 100,000 pages
Comparison of Indexes (evaluating Single-Column Sum Aggregates) Plan 3 : Calculating the Sum through a Value-List(Bitmap) Index if (COUNT (Bf AND Bnn) = = 0) Return null; SUM = 0.0; for each non-null value v in the index for C { Designate the set of rows with value v as Bv SUM += v * COUNT(Bf AND Bv); } Return SUM; Bf: 100,000,000bits=12,500,000B3125 pages Bv: 100,000,000RIDs of 4 bytes each100,000 pages Total: 103,125pages
Comparison of Indexes (evaluating Single-Column Sum Aggregates) Plan 4 : Calculating the SUM through a Bit-Sliced Index if (COUNT (Bf AND Bnn) = = 0) Return null; SUM = 0.0; for i = 0 to N SUM += * COUNT(Bi AND Bf); Return SUM; Bf: 100,000,000bits=12,500,000B3125 pages 2 million rows: 21Bitmaps Total:22*3125=68,750 pages
Comparison of Indexes (evaluating Single-Column Sum Aggregates)
Evaluating OLAP-style Queries • OLAP approach creates precalculates results of some Grouped Queries and stores them in summary tables. • The expected set of queries is known in advance? • Size of data in summary tables grows as the product of the number of values in the independent dimensions (space requirement?) How to speed up Join and Group By ? Join Indexes and Bitmap-Join-Indexes
Join Indexes A join index: an index on one table that involves a column value from different table through a commonly encountered join.
Bitmap Join Index A Bitmap Join Index spans multiple tables and improves query performance between the joined tables.
Bitmap Join Index Select Sum(Dollar_sales) From Sales S Natural Join Product P Natural Join Customer C Where P.Brand=‘Dell’ AND C.State=‘PA’ = and
Calculating Groupset Aggregates • Select Sum(F.A) From S,D1,D2,D3 Where condition Group by D1.d1, D2.d2, D3.d3 • Using Value-List index to determine Groupset (F.di=Di.di, without join!) • Using Projection index on F.A to get SUM(F.A)
Improved Grouping Efficiency Problem: Groupsets and rows are randomly placed on disk. Segmentation: Partition rows in F into Segments. Query evaluation: one segment at a time. Clustering: Cluster the fact table F D1 =d1-1 111111111111111111111110000000000000000000… =d1-2 000000000000000000000001111111111111111000… …… D2 =d2-1 111111000000000000000001111110000000000000… =d2-2 000000111111100000000000000001111111100000… …… D3 =d3-1 110000110000000000000001111110000000000000… =d3-2 001100001100000000000000000001111111100000… …… =d3-n3 000011000001100000000000000001111111100000… (d1-1, d2-1, d3-1) 11000000000000000000000000000000000000… (d1-1, d2-1, d3-2) 00110000000000000000000000000000000000… Groupset Indexes: Keyvalues are a concatenation of the dimensional primary-key values
Conclusion • Analyze Value-List index, Bitmap index, Projection index and Bit-Sliced index • Combine Bitmap indexing and physical row clustering to evaluate OLAP queries involving aggregation and grouping
Reference • Improved Query Performance With Variant Indexes – Patrick O’Neil and Dallan Quass, Proc. ACM SIGMOD Conf. 1997, Pages 38-49. • Bitmap Index Design and Evaluation – C.Y. Chan and Y.E. Ioannidis 1998. 6 • Database System Implementation – Hector Garcia M., Jeffrey D.U. and Jennifer W., Prentice Hall, 2000 • Encoded Bitmap Indexing for Data Warehouses – M.C. Wu and A.P. Buchmann 1998. 2 • An Efficient Bitmap Encoding Scheme for Selection Queries – C.Y. Chan and Y.E. Ioannidis 1998. 6 • Multidimensional Indexing and Query Coordination for Tertiary Storage Management – A. Shoshani and L.M. Bernardo, etc. 1999. 10 • Multi-Table Joins Through Bitmapped Join Indices – P. O’Neil and G. Graefe 1995. 9