1 / 16

ICS 421 Spring 2010 Indexing (1)

ICS 421 Spring 2010 Indexing (1). Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa. How to speed u p queries?. SELECT * FROM Sailors WHERE age>40. Array of Sailor Tuples /Records. File of Record for Sailors. Binary Search Trees.

evadne
Download Presentation

ICS 421 Spring 2010 Indexing (1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICS 421 Spring 2010Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa Lipyeow Lim -- University of Hawaii at Manoa

  2. How to speed up queries? SELECT* FROMSailors WHEREage>40 Array of Sailor Tuples/Records File of Record for Sailors Lipyeow Lim -- University of Hawaii at Manoa

  3. Binary Search Trees • Given search value • if value < node.value, then follow left pointer • Else follow right pointer • How do generalize each index node to an index page ? • How do we generalize this to search pages of records ? 26 20 34 30 41 28 21 18 20 21 21 26 27 28 30 31 34 37 41 45 Lipyeow Lim -- University of Hawaii at Manoa

  4. Indexes • What do we store in the index nodes ? Let k be the key value for an index entry: • Data record with key value k • <k, rid of data record with key value k> • <k, list of rids of data records with key value k> • What kind of queries does the index support? • Range • Point (or equality) Lipyeow Lim -- University of Hawaii at Manoa

  5. Overflow page index entry Indexed Sequential Access Method (ISAM) P K P K P P K m 0 1 2 1 m 2 • Static (m+1)-way Search Tree Non-leaf Pages Leaf Pages Primary pages Lipyeow Lim -- University of Hawaii at Manoa

  6. ISAM: Example Insert new record with age 98 40 • Store data record at the leaf pages • Do we still need the file of record ? 20 33 51 63 age 40 98 63 10 20 51 33 46 55 97 15 27 37 rating sname sid Lipyeow Lim -- University of Hawaii at Manoa

  7. ISAM Facts • File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then space for overflow pages. • Index entries: <search key value, page id>; they `direct’ search for data entries, which are in leaf pages. • Search: Start at root; use key comparisons to go to leaf. Cost=O(log F N) ; F = # entries/index pg, N = # leaf pgs • Insert: Find leaf data entry belongs to, and put it there. If full, allocate and put in overflow page • Delete: Find and remove from leaf; if empty overflow page, de-allocate. • Static tree structure: inserts/deletes affect only leaf pages. Lipyeow Lim -- University of Hawaii at Manoa

  8. B+ Tree Index B+ Tree Index • Insert/delete at log F N cost; keep tree height-balanced. (F = fanout, N = # leaf pages) • Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree. • Supports equality and range-searches efficiently. Index Entries Data Entries/Leaf Pages (“Sequence Set”) Lipyeow Lim -- University of Hawaii at Manoa

  9. B+ Tree: Search Example 13 17 24 30 • Leaf entries store <key,rid> pairs • What is the order ? • Search for: age=5, age=15, age>=24 14 16 19 20 22 24 27 29 2 3 5 7 33 34 38 39 Lipyeow Lim -- University of Hawaii at Manoa

  10. Inserting a new data entry • Find correct leaf L. • Put data entry onto L. • If L has enough space, done! • Else, must splitL (into L and a new node L2) • Redistribute entries evenly, copy upmiddle key. • Insert index entry pointing to L2 into parent of L. • This can happen recursively • To split index node, redistribute entries evenly, but push upmiddle key. (Contrast with leaf splits.) • Splits “grow” tree; root split increases height. • Tree growth: gets wider or one level taller at top. Lipyeow Lim -- University of Hawaii at Manoa

  11. Example: Insert 8* 17 17 pushed up into parent node 13 17 24 30 14 16 19 20 22 24 27 29 5 2 2 5 24 7 3 30 3 13 5 8 7 33 34 38 39 5 copied up to parent node 5 Lipyeow Lim -- University of Hawaii at Manoa

  12. Deleting a data entry • Start at root, find leaf L where entry belongs. • Remove the entry. • If L is at least half-full, done! • If L has only d-1 entries, • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). • If re-distribution fails, mergeL and sibling. • If merge occurred, must delete entry (pointing to L or sibling) from parent of L. • Merge could propagate to root, decreasing height. Lipyeow Lim -- University of Hawaii at Manoa

  13. Miscellaneous • How do we handle data with duplicates ? • Overflow buckets • Make rid part of the key • Each data entry stores <key, list of rids> • Clustered vsUnclustered indexes Index entries UNCLUSTERED CLUSTERED direct search for data entries Data entries Data entries (Index File) (Data file) Data Records Data Records Lipyeow Lim -- University of Hawaii at Manoa

  14. 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Bulk Loading a B+ Tree • If we have a large collection of records, and we want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow. • Bulk Loadingcan be done much more efficiently. • Initialization: Sort all data entries, insert pointer to first (leaf) page in a new (root) page. Root Sorted pages of data entries; not yet in B+ tree Lipyeow Lim -- University of Hawaii at Manoa

  15. Bulk Loading (cont.) Root 10 20 • Index entries for leaf pages always entered into right-most index page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.) • Much faster than repeated inserts, especially when one considers locking! Data entry pages 6 12 23 35 not yet in B+ tree 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Root 20 10 Data entry pages 35 not yet in B+ tree 6 23 12 38 Lipyeow Lim -- University of Hawaii at Manoa 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35*

  16. Creating Indexes • Most DBMS (eg. DB2) supports only B+ tree indexes: CREATE INDEX myIdxONmytable(col1, col3) CREATE UNIQUE INDEX myUniqIdxONmytable(col2, col5) CREATE INDEX myIdxONmytable(col1, col3) CLUSTER • If a primary key is specified in the CREATE TABLE statement, an (unclustered) index is automatically created for the PK. • To create a clustered PK index: • Create table without PK constraint • Create index on PK with cluster option • Alter table to add PK constraint • To get rid of unused indexes: DROP INDEX myIdx; Lipyeow Lim -- University of Hawaii at Manoa

More Related