700 likes | 771 Views
Data Structures. Lecture 5 B-Trees. Haim Kaplan and Uri Zwick November 2012. A 4 -node. 10. 25. 42. key < 10. 10 < key < 25. 25 < key < 42. 42 < key. 3 keys. 4 -way branch. An r -node. …. k 0. k 1. k 2. k r−3. k r−2. c 0. c 1. c 2. c r −2. c r −1.
E N D
Data Structures Lecture 5 B-Trees Haim Kaplan and Uri ZwickNovember 2012
A 4-node 10 25 42 key< 10 10 < key < 25 25 < key < 42 42 < key 3 keys 4-way branch
An r-node … k0 • k1 • k2 • kr−3 • kr−2 c0 c1 c2 cr−2 cr−1 r−1 keys r-way branch
B-Trees (with minimum degree d) Each node holds between d−1 and 2d −1 keys Each non-leaf node has between d and 2d children The root is special:has between 1 and 2d −1 keys and between 2 and 2d children (if not a leaf) All leaves are at the same depth
A 2-4 tree B-Tree with minimal degree d=2 13 4 6 10 15 28 1 3 30 40 50 14 5 7 11 16 17
Node structure … k0 • k1 • k2 • kr-3 • kr-2 r –the degree c0 c1 c2 cr−2 cr−1 key[0],…key[r−2] –the keys item[0],…item[r−2] –the associated items child[0],…child[r−1] –the children leaf –is the node a leaf? Possibly a different representation for leafs
The height of B-Trees • At depth 1 we have at least 2 nodes • At depth 2 we have at least 2dnodes • At depth 3 we have at least 2d2nodes • … • At depth h we have at least 2dh−1nodes
Look for k in node x Look for k in the subtree of node x Number of nodes accessed - logdn Number of operations – O(d logdn) Number of ops with binary search – O(log2d logdn) = O(log2n)
B-Trees vs binary search trees • Wider and shallower • Access less nodes during search • But may take more operations
The hardware structure CPU Cache Disk Each memory-level much larger but much slower RAM Information moved in blocks
A simplified I/O model CPU RAM Disk Each block is of size m. Count both operations and I/O operations
Data structures in the I/O model Each node (struct) is allocated continuously. Harder to control the disk blocks containing different nodes Linked list and search trees behave poorly in the I/O model. Each pointer followed may cause a disk access Pick d such that a node fits in a block B-trees reduce the worst case # of I/Os
Look for k in node x Look for k in the subtree of node x I/Os Number of nodes accessed - logdn Number of operations – O(d logdn) Number of ops with binary search – O(log2d logdn) = O(log2n)
Red-BlackTrees vs. B-Trees n = 230 109 30 ≤ height of Red-BlackTree ≤ 60 Up to 60pages read from disk Height of B-Tree with d=1000 is only 3 Each B-Tree node resides in a block/page Only 3 (or 4) pages read from disk Disk access 1 millisecond (10-3 sec) Memory access 100 nanosecond (10-7 sec)
B-Trees – What are they good for? • Large degree B-treesare used to represent very large disk dictionaries. The minimum degree d is chosen according to the size of a disk block. • Smaller degree B-trees used for internal-memory dictionaries to overcome cache-miss penalties. • B-trees with d=2, i.e., 2-4 trees, are very similar to Red-Black trees.
Rotate/Steal right A B B A Rotate/Steal left Number of operations – O(d) Number of I/Os – O(1)
Split B A C B A C d−1 d−1 d−1 d−1 Join Number of operations – O(d) Number of I/Os – O(1)
Insert 13 5 10 15 28 1 3 30 40 50 14 6 11 16 17 Insert(T,2)
Insert 13 5 10 15 28 1 2 3 30 40 50 14 6 11 16 17 Insert(T,2)
Insert 13 5 10 15 28 1 2 3 30 40 50 14 6 11 16 17 Insert(T,4)
Insert 13 5 10 15 28 1 2 3 4 30 40 50 14 6 11 16 17 Insert(T,4)
Split 13 5 10 15 28 1 2 3 4 30 40 50 14 6 11 16 17 Insert(T,4)
Split 13 5 10 15 28 2 30 40 50 14 1 3 4 6 11 16 17 Insert(T,4)
Split 13 2 5 10 15 28 1 30 40 50 14 3 4 6 11 16 17 Insert(T,4)
Splitting an overflowing node B A C B A C d d−1 d d−1
Another insert 13 2 5 10 15 28 1 30 40 50 14 3 4 6 11 16 17 Insert(T,7)
Another insert 13 2 5 10 15 28 1 30 40 50 14 6 7 3 4 11 16 17 Insert(T,7)
and another insert 13 2 5 10 15 28 1 30 40 50 14 6 7 3 4 11 16 17 Insert(T,8)
and another insert 13 2 5 10 15 28 1 30 40 50 14 3 4 11 16 17 6 7 8 Insert(T,8)
and the last for today 13 2 5 10 15 28 1 30 40 50 14 3 4 11 16 17 6 7 89 Insert(T,9)
Split 13 2 5 10 15 28 7 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)
Split 13 2 5 7 10 15 28 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)
Split 13 5 2 7 10 15 28 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)
Split 5 13 2 7 10 15 28 1 30 40 50 14 3 4 8 9 11 6 16 17 Insert(T,9)
Insert – Bottom up • Find the insertion point by a downward search • Insert the key in the appropriate place • If the current node isoverflowing, split it • If its parent is now overflowing, split it, etc. • Disadvantages: • Need both a downward scan and an upward scan • Need to keep parents on a stack • Nodes are temporarily overflowing
Insert – Top down • While conducting the search,splitfull children on the search pathbefore descending to them! • When the appropriate leaf it reached,it is not full, so the new key may be added!
Split-Root(T) T.root C T.root C d−1 d−1 d−1 d−1
Split-Child(x,i) x key[i] x key[i] B A C B A x.child[i] x.child[i] C d−1 d−1 d−1 d−1
Insert – Top down • While conducting the search,splitfull children on the search pathbefore descending to them! Number of I/Os – O(logdn) Number of operations – O(d logdn)
Deletions from B-Trees 7 15 3 10 13 22 28 30 40 50 20 24 26 14 1 2 4 6 11 12 8 9 delete(T,26)
Delete 7 15 3 10 13 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,26)
Delete 7 15 3 10 13 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,13)
Delete (Replace with predecessor) 7 15 3 10 12 22 28 30 40 50 20 24 14 1 2 4 6 11 12 8 9 delete(T,13)
Delete 7 15 3 10 12 22 28 30 40 50 20 11 24 14 1 2 4 6 8 9 delete(T,13)
Delete 7 15 3 10 12 22 28 30 40 50 20 11 24 14 1 2 4 6 8 9 delete(T,24)
Delete 7 15 3 10 12 22 28 30 40 50 20 11 14 1 2 4 6 8 9 delete(T,24)
Delete (steal from sibling) 7 15 3 10 12 22 30 40 50 20 11 28 14 1 2 4 6 8 9 delete(T,24)
Rotate/Steal right A B B A Rotate/Steal left