700 likes | 893 Views
2. IQ Indexes. Indexes. Typical RDBMS uses B-tree index usually a separate structure in addition to data consists of data pages with values and pointers Sybase (and others) has special type which contains pointers and data pages B-tree indexes are expensive
E N D
2 IQ Indexes
Indexes • Typical RDBMS uses B-tree index • usually a separate structure in addition to data • consists of data pages with values and pointers • Sybase (and others) has special type which contains pointers and data pages • B-tree indexes are expensive • consume space and time (to build) • require maintenance after data refresh • not useful with low cardinality data columns
4 2 6 1 3 5 7 1 2 3 4 5 6 7 8 B-Tree – an example Let’s assume a table with key values from 1 to 8 – and build the B-Tree The decision tree is now very simple. The question asked is “Is the required value less than or equal to the value in the tree?”If it is take the left fork else go right Index Levels - Keys Leaf Level (Data)
IQ-M Indexes • Indexes are the data (a data access method) • no separate data store • A column is likely to have multiple indexes • always has at least one index • Index selection for a column is based upon • Number of discrete values (cardinality) • Usage in queries • To some extent, the column’s data type
Index Types – In total now 10! • Default Index (aka Fast Projection or FP) • Raw data compressed on disk • comes in 3 flavours • Low Fast (LF) • Bit map index • High Non Group (HNG) • Bit-wise index • High Group (HG) • G-Array (relative of a B-tree) • Comes in 3 flavours • Compare Index (CMP) • Column Compare Index • Word Index (WD) • Sort of “Free Text” Index
Default Index - FP • Created automatically by Create Table • Used for • Ad-Hoc joins • String searches • Certain calculations • Projection of data • This index cannot be created or dropped
IQ Unique Constraint - 1 Create Table department(dept_idchar(4)Not Null IQ Unique(200),emp_lnamevarchar(25)Not Null IQ Unique(75000), • Improves load processing • IQ will construct 1-byte FP index for dept_id • Won’t try either a 1-byte or 2-byte FP on emp_lname • Too many values for either
IQ Unique Constraint - 2 • In terms of query performance the 1-byte and 2-byte FP indexes can speed up the server • The first method is that searching an FP (of any type) can be done in parallel, which may be faster than a so-called fast index (LF and HG) • The second speed up is that for a LIKE operator the 1-byte and 2-byte FP can be faster than an HNG index • Based on the width of the search predicate • And the width of the FP index (1 or 2 bytes)
FP Index Types • Depending on cardinality and the use of IQ UNIQUE, IQ-M will initially construct FP index in one of 3 ways • If > 65536 distinct values in column • Flat FP index • If < 256 distinct values • 1-byte FP • Between 256 and 65536 distinct values • 2-byte FP
More than 65536 values in a column The raw data is compressed (on disk) Flat FP Index Color Red Blue Green Red
Less than 256 values in a column A one byte lookup table is built Data Values Color 1 1 Red Red 2 2 Blue Blue 3 Green Green 1 3 3 Red 1 3 2 1-byte FP Index Data Lookup Table
Between 256 and 65536 values in a column 4 pages A two byte lookup table is built Data Values 1 1 Red 2 2 1 1 Blue 3 3 Green 2 2 1 1 3 3 Red 3 3 1 1 3 3 2 2 2-byte FP Index Data Lookup Table Color Red Blue Green
FP Indexes • During load, IQ-M will try to build Flat FP first, unless otherwise specified • If you specify a low cardinality 1-byte FP at the start (by using IQ UNIQUE) then this will • *NEW in 12.4.3* Resorts to a 2-byte FP after 256 values or n pages see two slides on…(not as we used to think directly to a flat FP) • If you specify a 2-byte FP at the start (by using IQ UNIQUE) then this will • Resort to Flat FP after approx. 16500 values or n pages see two slides on…
FP Index Growth • Although an FP can, as the cardinality grows, change (1-byte FP>2-byteFP) or (2-byte FP>Flat FP) it can never revert • There is a high cost in conversion – either way • The only backwards conversion is to drop the column (not the index), recreate and reload (Expensive)
1-Byte FP to Flat FP • FP_Number_Lookup_Pages Def 1 • This option controls the number of pages (in 12.4) in the 1-byte lookup table • When the lookup table grows beyond this the index will flip to a 2-byte FP • Do not make this more than 4 pages – or you will use a vast amount of memory for each and every FP – but you may prove me wrong! • This was we use PageList for the storage of 1-byte in 12.4.
Limit for 2-byte FP - 1 • FP_Lookup_Size Def 32767 • This option controls the number of discrete values that a 2-byte FP can contain, as a maximum Note this is in Kbytes • This is the new 12.4.2 and beyond option for constraining the size of a 1- or 2- byte FP • By default a 2-byte FP to Flat FP flip will occur when the lookup table grows beyond 32 MB Note – If you have a pre-12.4 1 byte FP it is constrained by FP_Number_Lookup_Pages
Limit for 2-byte FP - 2 • FP_Lookup_Size Def 32767 • For a bigint of 8 bytes, 65536 entries take only around 1+ MB. • For a max varchar of 255 bytes, 65536 entries take no more than 18 MB. • It is critical to keep the entire ByteStore (lookup table) in memory for performance reason.
IQ UNIQUE – A catch - 1 • If you have specified a very incorrect value for the IQ UNIQUE and the real number of discrete values is high there can be a problem
IQ UNIQUE – A catch - 2 • Say you have specified IQ UNIQUE 100, and the column has 1.2 million distinct values • The FP will start as a 1-byte, failover to a 2-byte then failover to a flat FP • Now because you “thought” the column only had a cardinality of 100 you did not put a fast index on it • Then the optimizer has no way to determine the exact cardinality of the column. It knows that the IQ UNIQUE value is wrong (because there is a flat FP index), so it doubles the FP-lookup-size value and assumes this is the cardinality. • If you had specified the IQ UNIQUE as greater than FP-lookup-size then the optimizer has to assume the IQ UNIQUE value is the cardinality
Use of 1 and 2-byte FP Indexes - 1 • Even with the “catch” noted above the lookup FP provides exact counts for distinct values of the optimizer • This may mean you do not need an LF or HG • Projection of the column can be significantly faster, because of fewer disk reads • But these are very small advantages compared to …
Use of 1 and 2-byte FP Indexes - 2 • Access Paths • In 12.4.2 and beyond we can perform more operations on the lookup table of the FP – instead of the other indexes. • LIKE (simple and Complex) • Simple Predicate • YEAR(column_name) > ‘1995’ • SQRT(column_name) < 100 • DATEPART(hh,column_name) between 10 and 12 • Also the resulting “scan” is performed in parallel (if required)
Low Fast Index • Traditional Bit Map for Low Cardinality • Less than 10,000 unique values in a column • Can be unique • Required for performance involving • Joins • Group by • MIN, MAX, SUM, AVG functions • Where clause predicates • Equality / Inequality, Ranges, IN lists
Bitmap Indexes • What are bitmaps? • Bitmaps are representations for each value in a field • True = 1 • False = 0 • Bit position corresponds to a fixed row ID • For each discrete value there is one bitmap – the length of which is number of rows in the table
Digression on Bitmaps - 1 • If there are 7,000,000 rows in the table each bitmap will be 7,000,000 bits long • This could be 1,000,000 bytes (almost a megabyte) • If there are 1000 possible values in the column this could mean 1 Gbyte (approx.) for the column • Is this correct ?
Digressions on Bitmaps - 2 • Yes, sort of… • In IQ-M there are 4 ways of holding a bitmap page • The conditions for the 4 types of page are • All Zero Bitmap • Few 1s • 20-80% 1s • Almost all 1s • All 1 bitmap
Bitmap “Types” • An all Zero bitmap page is not stored • just an entry in the block map • An all 1 bitmap page is also not stored • a similar entry in the block map • For the 20-80% 1s there is a real bitmap • called a bit vector • For the nearly all 1s or nearly all zero pages the data in Run Length Encoded
Run Length Encoding • Used when there is a very sparse set in bits set (or not set) • Very efficient on storage 1-50,90,102-135,1090-4573,7833, 9011-11430,...
... Bitmap Indexes • Each unique value has it’s own bitmap • Designed for incremental additions of rows • Query: • select count(*) from customers where state =‘AL’
High Group Index - 1 • High Cardinality data columns • More than 1000 unique values • Can enforce uniqueness • Special internal structure for unique HG indexes • Automatically created by Create Table for columns with UNIQUE or PRIMARYKEYconstraint (regardless of cardinality)
a b c 4 High Group Index - 2 B-Tree Index 1,2 3,5,6 New Blocks can be added into the Linked List
a B(ptr) C = 4 High Group Index - 3 Much faster for load Much faster for Skewed data give bitmaps directly to optimizer B-Tree Index 1,2 When a page is completely filled With one value the array is converted to a bitmap 1011010101001 1010001001001 0010100101010
High Group Index - 4 • Required for performance on High Cardinality columns used for: • Joins • Select Distinct, Count Distinct • Group By • Takes up the most space in the database • Requires the longest time to load/delete • Cannot be used with certain data types
Specialised High Group - 1 • When a column is created (or altered) to have the following constraints • UNIQUE • Primary Key • Then the column has a unique HG index created automatically • This is a HG without a G-Array
Specialised High Group - 2 • Provided the combination of two or more columns is unique, then you can generate a unique multi-column HG index on product of the two columns • You may still generate other indexes on the base columns
High Non Group Index (HNG) • Bit-Wise Index • data stored as binary • vertically partitioned • patented by Sybase • cannot be unique • cannot be used with certain data types • Used for • range searches for all cardinality columns • aggregation (sum and average functions)
HNG - High Card Bit-Wise Index • Data with large number of values stored in binary form • Data sliced vertically - each bit position can be manipulated separately • Many bit positions are either all on or all off so no storage space is required with compression
HNG Index Processing For the query: select sum(sales) from customers ASIQ performs the sum as follows : #1bits on*1 + #2bits on*2 + #4bits on*4 +#8bits on*8 (6*1) + (4*2) +(4*4)+ (4*8) = 62
HNG Indexes with Other Indexes • Any Cardinality columns also need an HNG Index • Columns used with aggregates (sum, avg) • Range searches • Root String searches • example: where cust_name like “Stan%” • all other string searches will use the FP index
Compare (CMP) Index • The CMP index is used for “comparing” 2 columns in the same table • It is really just 3 bitmaps • A “less than” bitmap • An “Equal to” bitmap • A “Greater than” bitmap • Performance of t1.col1 > t1.col2 is substantially improved • Load times are only marginally affected (<1%)
Word (WD) Index • This is a specialised index that indexes each and every “word” in a column • Used for char() varchar() and long varchar() • Slightly faster to load than an equivalent column HG index • Accessed by the “contains” verb where t1.col1 contains (‘Richard’, ‘Soundy’)
Word Index Use • The delimiters and the length of the entries can be set for the WD index during index creation time. • By default the delimiters are all the ASCII characters not defined as number or alphabetic • The limit is the max size of each entry CREATE WD INDEX earnings_wd ON earnings_report_table(varchar) DELIMITED BY ‘ :;.’ LIMIT 25
Indexing Strategy • Some columns may now need 2 or 3 indexes • Possible Combinations of indexes: • FP + LF • FP + LF + HNG • FP + HG • FP + HG + HNG • Unlikely Combination • FP + LF + HG
Assigning Indexes • Every column has the Default Index (FP) • Create Table command builds this index • cannot be dropped • No additional indexes needed with • columns used only for Projection (Select list) • columns with ‘bit’ datatype
High Group – 1 Regardless of cardinality put an HG on: • Any Primary Key (if not defined in create or alter table – where it is created automatically) • Any UNIQUE column (if not defined in create or alter table – where it is created automatically) • Any column used in a join
High Group – 2 • Why do the above? • It is better and generally faster – especially on Low Cardinality Columns • This is a change from previous thoughts
High Group – Warning • A High Group Index is the most complex index • It takes the longest to load (and delete) • The load time is directly proportional to the length or width of the data component • Smaller (in size not cardinality) data is better • Cardinality is also an issue the higher the number of “new” values in an incremental load the slower the load can get • Can the data be broken into more than one column?
Low Fast or High Group • If a column is not • A Primary Key • UNIQUE • used in a join (ad-hoc or Join Index) • But is still used for more than just a projection column it will require another one or two indexes)
Low Fast or High Group • Put an HG or an LF (depending on Cardinality) on the following column types • used in SELECT statement with functions: • min, max, count, count(distinct) • used in WHERE clause predicates: • equality / inequality • IN list • EXISTS • used in GROUP BY • used in ORDER BY
Additional Indexes Misconception • The 1000 cardinality break point is not a fixed limit • Absolute limit is cardinality of 9,999 • There are no hard and fast rules • A 5,000 card LF may work faster in one case • In an other an 800 card HG may be faster • Remember a LF is much faster to load data into than an HG
Regardless of Cardinality • Build an HNG index on all columns used • in BETWEEN or RANGE comparisons • with AVG() or SUM() functions • root string searches • Examples: • where sold_date between ‘1/5/99’ and ‘2/5/99’ • where revenue > 1000 • Select sum(revenue) from ... • where customer_name like ‘Syb%’ • But remember a 1 or 2 Byte FP may do the above faster!