230 likes | 335 Views
15.6 Index Based Algorithms. By: Tomas Tupy (123). Outline. Terminology Clustered Indexes Example Non-Clustered Indexes Index Based Selection Joining Using an Index Join Using a Sorted Index. What is an Index?.
E N D
15.6 Index Based Algorithms By: Tomas Tupy (123)
Outline • Terminology • Clustered Indexes • Example • Non-Clustered Indexes • Index Based Selection • Joining Using an Index • Join Using a Sorted Index
What is an Index? • A data structure which improves the speed of data retrieval ops on a relation, at the cost of slower writes and the use of more storage space. • Enables sub-linear time lookup. • Data is stored in arbitrary order, while logical ordering is achieved by using the index.
Terminology Recap • B(R)– Number of blocks needed to hold R • T(R) – Number of tuples in R • V(R,a) – Number of distinct values of the column for a in R. • Clustered Relation – Tuples are packed into as few blocks as possible. • Clustered Indexes – Indexes on attribute(s) such that all tuples with a fixed value for the search key appear on a few blocks as possible.
Clustering Indexes • A relation is clustered if its tuples are packed into relatively few blocks. • Clustering indexes are indexes on an attribute or attributes such that all the tuples with a fixed value for the search key of this index appear in as little blocks as possible. • Tuples are stored to match the index order. • A relation that isn’t clustered cannot have a clustering index.
Clustering Indexes • Let R(a,b) be a relation sorted on attribute a. • Let the index on a be a clustering index. • Let a1be a specific value for a. • A clustering index has all tuples with a fixed value packed into minumum # of blocks. a1 a1 a1 a1a1a1 a1 a1a1a1 a1 All the a1 tuples
Pros/Cons • Pros • Faster reads for particular selections • Cons • Writing to a table with a clustered index can be slower since there might be a need to rearrange data. • Only one clustered index possible.
Clustered Index Example • Problem: We want to quickly retrieve all orders for a particular customer. • How do we do this?
Clustered Index Example • Solution: Create a clustered index on the “CustomerID” column of the Order table. • Now the tuples with the same CustomerID will be physically stored closed to one another on the disk.
Non Clustered Indexes • There can be many per table • Quicker for insert and update operations. • The physical order of tuples is not the same as index order.
Index Based Algorithms • Especially useful for the selection operator. • Join and other binary operators also benefit.
Index-Based Selection • No index • Without an index on relation R, we have to read all the tuples in order to implement selection oC(R), and see which tuples match our condition C. • What is the cost in terms of disk I/O’s to implement oC(R)? (For both clustered and non-clustered relations)
Index-Based Selection • No index • Answer: • B(R) if our relation is clustered • Upto T(R) if relations in not-clustered.
Index-Based Selection • Let us consider an index on attribute a where our condition C is a = v. • oa=v(R) • In this case we just search the index for value v and we get pointers to exactly the tuples we need.
Index-Based Selection • Let’s say for our selection oa=v(R), our index is clustering. • What is the cost in the # of disk I/O’s to retrieve the set oa=v(R)?
Index-Based Selection • Answer • the average is: B(R) / V(R,a) • A few more I/Os: • Index might not be in main memory • Tuples which a = v might not be block aligned. • Even if clustered, might not be packed as tight as possible. (Extra space for insertion)
Index-Based Selection • Non-clustering index for our selection oa=v(R) • What is the cost in the # of disk I/O’s to retrieve the set oa=v(R)?
Index-Based Selection • Answer • Worst case is: T(R) / V(R,a) • This can happen if tuples live in different blocks.
Joining by Using an Index(Algorithm 1) • Consider natural join: R(X,Y) |><| S(Y,Z) • Suppose S has and index on attribute Y. • Start by examining each block of R, and within each block consider each tuple t, where tY is a component of t corresponding to the attribute Y. • Now we use the index to find tuples of S that have tY in their Y component. • These tuples will create the join.
Joining by Using an Index(Algorithm 1) Analysis • Consider R(X,Y) |><| S(Y,Z) • If R is clustered, then we have to read B(R) blocks to get all tuples of R. If R is not clustered then up to T(R) disk I/O’s are required. • For each tuple t of R, we must read an average of T(S) / V(S,Y) tuples of S. • Total: B(R)T(S) / V(S,Y) for clustered index, and T(R)T(S) / V(S,Y) for non-clustered index.
Join Using a Sorted Index • Consider R(X,Y) |><| S(Y,Z) • Data structures such as B-Trees provide the best sorted indexes. • In the best case, if we have sorting indexes on Y for both R and S then we perform only the last step of the simple sort-based join. (Sometimes called zig-zag join)
Join Using a Sorted Index(Zig-zag join) • Consider R(X,Y) |><| S(Y,Z) where we have indexes on Y for both R and S. • Tuples from R with a Y value that does not appear in S never need to be retrieved, and vice-versa… Index on Y in R Index on Y in S
Thank You! • Questions?