1 / 23

15.6 Index Based Algorithms

15.6 Index Based Algorithms. By: Tomas Tupy (123). Outline. Terminology Clustered Indexes Example Non-Clustered Indexes Index Based Selection Joining Using an Index Join Using a Sorted Index. What is an Index?.

maya
Download Presentation

15.6 Index Based Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 15.6 Index Based Algorithms By: Tomas Tupy (123)

  2. Outline • Terminology • Clustered Indexes • Example • Non-Clustered Indexes • Index Based Selection • Joining Using an Index • Join Using a Sorted Index

  3. What is an Index? • A data structure which improves the speed of data retrieval ops on a relation, at the cost of slower writes and the use of more storage space. • Enables sub-linear time lookup. • Data is stored in arbitrary order, while logical ordering is achieved by using the index.

  4. Terminology Recap • B(R)– Number of blocks needed to hold R • T(R) – Number of tuples in R • V(R,a) – Number of distinct values of the column for a in R. • Clustered Relation – Tuples are packed into as few blocks as possible. • Clustered Indexes – Indexes on attribute(s) such that all tuples with a fixed value for the search key appear on a few blocks as possible.

  5. Clustering Indexes • A relation is clustered if its tuples are packed into relatively few blocks. • Clustering indexes are indexes on an attribute or attributes such that all the tuples with a fixed value for the search key of this index appear in as little blocks as possible. • Tuples are stored to match the index order. • A relation that isn’t clustered cannot have a clustering index.

  6. Clustering Indexes • Let R(a,b) be a relation sorted on attribute a. • Let the index on a be a clustering index. • Let a1be a specific value for a. • A clustering index has all tuples with a fixed value packed into minumum # of blocks. a1 a1 a1 a1a1a1 a1 a1a1a1 a1 All the a1 tuples

  7. Pros/Cons • Pros • Faster reads for particular selections • Cons • Writing to a table with a clustered index can be slower since there might be a need to rearrange data. • Only one clustered index possible.

  8. Clustered Index Example • Problem: We want to quickly retrieve all orders for a particular customer. • How do we do this?

  9. Clustered Index Example • Solution: Create a clustered index on the “CustomerID” column of the Order table. • Now the tuples with the same CustomerID will be physically stored closed to one another on the disk.

  10. Non Clustered Indexes • There can be many per table • Quicker for insert and update operations. • The physical order of tuples is not the same as index order.

  11. Index Based Algorithms • Especially useful for the selection operator. • Join and other binary operators also benefit.

  12. Index-Based Selection • No index • Without an index on relation R, we have to read all the tuples in order to implement selection oC(R), and see which tuples match our condition C. • What is the cost in terms of disk I/O’s to implement oC(R)? (For both clustered and non-clustered relations)

  13. Index-Based Selection • No index • Answer: • B(R) if our relation is clustered • Upto T(R) if relations in not-clustered.

  14. Index-Based Selection • Let us consider an index on attribute a where our condition C is a = v. • oa=v(R) • In this case we just search the index for value v and we get pointers to exactly the tuples we need.

  15. Index-Based Selection • Let’s say for our selection oa=v(R), our index is clustering. • What is the cost in the # of disk I/O’s to retrieve the set oa=v(R)?

  16. Index-Based Selection • Answer • the average is: B(R) / V(R,a) • A few more I/Os: • Index might not be in main memory • Tuples which a = v might not be block aligned. • Even if clustered, might not be packed as tight as possible. (Extra space for insertion)

  17. Index-Based Selection • Non-clustering index for our selection oa=v(R) • What is the cost in the # of disk I/O’s to retrieve the set oa=v(R)?

  18. Index-Based Selection • Answer • Worst case is: T(R) / V(R,a) • This can happen if tuples live in different blocks.

  19. Joining by Using an Index(Algorithm 1) • Consider natural join: R(X,Y) |><| S(Y,Z) • Suppose S has and index on attribute Y. • Start by examining each block of R, and within each block consider each tuple t, where tY is a component of t corresponding to the attribute Y. • Now we use the index to find tuples of S that have tY in their Y component. • These tuples will create the join.

  20. Joining by Using an Index(Algorithm 1) Analysis • Consider R(X,Y) |><| S(Y,Z) • If R is clustered, then we have to read B(R) blocks to get all tuples of R. If R is not clustered then up to T(R) disk I/O’s are required. • For each tuple t of R, we must read an average of T(S) / V(S,Y) tuples of S. • Total: B(R)T(S) / V(S,Y) for clustered index, and T(R)T(S) / V(S,Y) for non-clustered index.

  21. Join Using a Sorted Index • Consider R(X,Y) |><| S(Y,Z) • Data structures such as B-Trees provide the best sorted indexes. • In the best case, if we have sorting indexes on Y for both R and S then we perform only the last step of the simple sort-based join. (Sometimes called zig-zag join)

  22. Join Using a Sorted Index(Zig-zag join) • Consider R(X,Y) |><| S(Y,Z) where we have indexes on Y for both R and S. • Tuples from R with a Y value that does not appear in S never need to be retrieved, and vice-versa… Index on Y in R Index on Y in S

  23. Thank You! • Questions?

More Related