Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing • Indexing • Basic Concepts • Ordered Indices • B+-Tree Index Files • Hashing • Static • Dynamic Hashing

search key pointer Basic Concepts Value • Search Key - set of attributes used to look up records in a file. record ? value

Index Evaluation Metrics • Access types supported efficiently. E.g., • Point query: find “Tom” • Range query: find students whose age is between 20-40 • Access time • Update time • Space overhead

Ordered Indices • In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library.

same order 50 90 10 130 170 210 110 70 150 30 190 230 70 90 30 10 50 20 80 60 40 100 Search key • Primary index • Also called clustering index • The search key of a primary index is usually but not necessarily the primary key.

different order 100 80 20 90 30 60 50 70 40 10 10 50 20 60 30 70 40 ... Search key Secondary index: • non-clustering index.

10 70 50 30 110 90 80 60 40 20 120 100 90 10 70 50 30 80 60 40 20 100 Sequential File Dense Index Dense Index: contains index records for every search-key values.

10 130 90 50 210 170 150 110 70 30 230 190 90 10 70 50 30 80 60 40 20 100 Sequential File Sparse Index • Sparse Index: contains index records for only some search-key values. • Applicable when records are sequentially ordered on search-key

100 30 20 80 90 10 40 60 50 70 90 30 ... 20 80 100 does not make sense! Secondary indexes Sequence field • Sparse index

90 170 10 210 170 130 330 10 490 50 570 30 70 110 410 230 90 250 150 190 90 10 70 30 50 60 100 40 80 20 Multilevel Index Sequential File Sparse 2nd level

90 30 20 80 100 50 70 40 10 60 10 10 50 50 60 20 70 30 90 40 ... ... sparse high level Multilevel Index Secondary indexes Sequence field • Lowest level is dense • Other levels are sparse

Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive

Outline • Conventional indexes • B+-Tree  NEXT

NEXT: Another type of index • Give up on sequentiality of index • Try to get “balance”

B+Tree Example n=4 Root 100 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35

Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95 Key is moved (not copied) from lower level non-leaf node to upper level non-leaf node

Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85 Key is copied (not moved) from leaf node to non-leaf node

n=4 Leaf: Non-leaf: 30 35 30 35 30 30

Size of nodes: n pointers n-1 keys

Don’t want nodes to be too empty • Use at least Root : 2 pointers Non-leaf: n/2 pointers Leaf : (n-1)/2 keys

n=4 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null

B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records • except for “sequence pointer”

(3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n n-1 n/2 n/2- 1 Leaf (non-root) n n-1 (n-1)/2 (n-1)/2 Root n n-1 2 1

Insert into B+tree (a) simple case • space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

32 n=4 100 (a) Insert key = 32 30 3 5 11 30 31

7 3 5 7 n=4 100 (b) Insert key = 7 30 3 5 11 30 31

160 180 160 179 n=4 100 (c) Insert key = 160 120 150 180 180 200 150 156 179

30 new root 40 40 45 n=4 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40

Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf

40 n=5 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50

35 35 n=5 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50

new root 40 25 30 (d) Non-leaf coalese • Delete 37 n=5 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

B+tree deletions in practice • Often, coalescing is not implemented • Too hard and not worth it!

Index Definition in SQL • Create an index create index <index-name> on <relation-name> (<attribute-list>) E.g.: create index gindex on country(gdp); • To drop an index drop index <index-name> E.g.: drop index gindex;

Chapter 12: Indexing and Hashing