1 / 34

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing. Indexing Basic Concepts Ordered Indices B+-Tree Index Files Hashing Static Dynamic Hashing. search key pointer. Basic Concepts. Value Search Key - set of attributes used to look up records in a file. record. ?. value. Index Evaluation Metrics.

osanna
Download Presentation

Chapter 12: Indexing and Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 12: Indexing and Hashing • Indexing • Basic Concepts • Ordered Indices • B+-Tree Index Files • Hashing • Static • Dynamic Hashing

  2. search key pointer Basic Concepts Value • Search Key - set of attributes used to look up records in a file. record ? value

  3. Index Evaluation Metrics • Access types supported efficiently. E.g., • Point query: find “Tom” • Range query: find students whose age is between 20-40 • Access time • Update time • Space overhead

  4. Ordered Indices • In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library.

  5. same order 50 90 10 130 170 210 110 70 150 30 190 230 70 90 30 10 50 20 80 60 40 100 Search key • Primary index • Also called clustering index • The search key of a primary index is usually but not necessarily the primary key.

  6. different order 100 80 20 90 30 60 50 70 40 10 10 50 20 60 30 70 40 ... Search key Secondary index: • non-clustering index.

  7. 10 70 50 30 110 90 80 60 40 20 120 100 90 10 70 50 30 80 60 40 20 100 Sequential File Dense Index Dense Index: contains index records for every search-key values.

  8. 10 130 90 50 210 170 150 110 70 30 230 190 90 10 70 50 30 80 60 40 20 100 Sequential File Sparse Index • Sparse Index: contains index records for only some search-key values. • Applicable when records are sequentially ordered on search-key

  9. 100 30 20 80 90 10 40 60 50 70 90 30 ... 20 80 100 does not make sense! Secondary indexes Sequence field • Sparse index

  10. 90 170 10 210 170 130 330 10 490 50 570 30 70 110 410 230 90 250 150 190 90 10 70 30 50 60 100 40 80 20 Multilevel Index Sequential File Sparse 2nd level

  11. 90 30 20 80 100 50 70 40 10 60 10 10 50 50 60 20 70 30 90 40 ... ... sparse high level Multilevel Index Secondary indexes Sequence field • Lowest level is dense • Other levels are sparse

  12. Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive

  13. Outline • Conventional indexes • B+-Tree  NEXT

  14. NEXT: Another type of index • Give up on sequentiality of index • Try to get “balance”

  15. B+Tree Example n=4 Root 100 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35

  16. Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95 Key is moved (not copied) from lower level non-leaf node to upper level non-leaf node

  17. Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85 Key is copied (not moved) from leaf node to non-leaf node

  18. n=4 Leaf: Non-leaf: 30 35 30 35 30 30

  19. Size of nodes: n pointers n-1 keys

  20. Don’t want nodes to be too empty • Use at least Root : 2 pointers Non-leaf: n/2 pointers Leaf : (n-1)/2 keys

  21. n=4 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null

  22. B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records • except for “sequence pointer”

  23. (3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n n-1 n/2 n/2- 1 Leaf (non-root) n n-1 (n-1)/2 (n-1)/2 Root n n-1 2 1

  24. Insert into B+tree (a) simple case • space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

  25. 32 n=4 100 (a) Insert key = 32 30 3 5 11 30 31

  26. 7 3 5 7 n=4 100 (b) Insert key = 7 30 3 5 11 30 31

  27. 160 180 160 179 n=4 100 (c) Insert key = 160 120 150 180 180 200 150 156 179

  28. 30 new root 40 40 45 n=4 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40

  29. Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf

  30. 40 n=5 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50

  31. 35 35 n=5 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50

  32. new root 40 25 30 (d) Non-leaf coalese • Delete 37 n=5 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

  33. B+tree deletions in practice • Often, coalescing is not implemented • Too hard and not worth it!

  34. Index Definition in SQL • Create an index create index <index-name> on <relation-name> (<attribute-list>) E.g.: create index gindex on country(gdp); • To drop an index drop index <index-name> E.g.: drop index gindex;

More Related