B trees

B trees • Nodes have more than 2 children • Each internal node has between k and 2k children and between k-1 and 2k-1 keys • A leaf has between k-1 and 2k-1 keys • The root has at least 2 children • All leaves are at the same distance from the root

2-4 tree and General k • k=2 • Each node has 2,3,or 4 children • WHAT IS BETTER: k =2 or k >> 2?? • Depth? • Large k better • But what about degree? • Small k better • Overall:

10 30 35 A 4-node 35 ≤ key key < 10 10 ≤ key < 30 30 ≤ key < 35

B vs. B+ • In a B tree items are in every node • In B+ tree items are at the leaves; internal nodes have keys to direct the search • The leaves are (possibly) also maintained in a linked list to allow fast sequential access

A 2-4+ tree 10 4 7 9 15 30 1 3 30 40 50 10 5 7 9 16 17

The height • The root has at least 2 children • At level 2 we have at least 2k nodes • At level 3 we have at least 2k2 nodes • At level h we have at least 2kh-1 nodes

Red-Black Trees • n = 230 = 109(approx). • 30 <= height <= 60. • When the red-black tree resides on a disk, up to 60 disk access are made for a search. • Disk access takes about 5 millisecond (10-4 sec) • Memory access takes about 100 nano (10-7 sec)

B-trees • B-trees are used when the tree resides in secondary storage. • k is picked according to the size of a disk block • Since the height is smaller we do less I/O, we get more in each single access

B-Trees • Large degree B-trees are used to represent very large dictionaries that reside on disk. • Smaller degree B-trees used for internal-memory dictionaries to overcome cache-miss penalties.

Node’s structure j a0 p1 a1 p2 a2 …pj aj k ≤ j ≤ 2k • aiis a pointer to a subtree. • piis a key Can search linearly each node. total time ≈ kh ≈ klogkn time Can maintain a little red-black tree or an array in each node so search takes ≈ log2k h ≈ log2n

Insert 14 5 9 15 30 1 3 30 40 50 14 5 9 16 17 Insert(2,T).

Insert 14 5 9 15 30 1 2 3 30 40 50 14 5 9 16 17 Insert(2,T).

Insert 14 5 9 15 30 1 2 3 30 40 50 14 5 9 16 17 Insert(4,T).

Insert 14 5 9 15 30 1 2 3 4 30 40 50 14 5 9 16 17 Insert(4,T).

Split 14 5 9 15 30 1 2 3 4 30 40 50 14 5 9 16 17 Insert(4,T).

Split 14 5 9 15 30 30 40 50 14 1 2 3 4 5 9 16 17 Insert(4,T).

Split 14 3 5 9 15 30 30 40 50 14 1 2 3 4 5 9 16 17 Insert(4,T).

14 3 5 9 15 30 30 40 50 14 1 2 3 4 5 9 16 17 Insert(6,T).

14 3 5 9 15 30 30 40 50 14 1 2 3 4 5 6 9 16 17 Insert(6,T).

14 3 5 9 15 30 30 40 50 14 1 2 3 4 5 6 9 16 17 Insert(7,T).

14 3 5 9 15 30 30 40 50 14 1 2 3 4 9 16 17 5 6 7 Insert(7,T).

14 3 5 9 15 30 30 40 50 14 1 2 3 4 9 16 17 5 6 7 Insert(8,T).

14 3 5 9 15 30 30 40 50 14 1 2 3 4 9 16 17 5 6 7 8 Insert(8,T).

Split 14 3 5 9 15 30 30 40 50 14 1 2 3 4 7 8 9 16 17 5 6 Insert(8,T).

Split 14 3 5 7 9 15 30 30 40 50 14 1 2 3 4 7 8 9 16 17 5 6 Insert(8,T).

Split 5 14 3 7 9 15 30 30 40 50 14 1 2 3 4 7 8 9 16 17 5 6 Insert(8,T).

Insert -- definition Add the new key in its position. Say in a node v. (*) If v has 4 keys split v into a 2-node u, a 1-node w, and a key k, (or two 2-nodes and a key if v is a leaf) If v was the root then create a new root r parent of u and w and stop. Replace v by u and w as children of p(v). Repeat (*) for v := p(v).

Split (2k) a0 p1 a1 p2 a2 …p2k a2k (k-1) a0 p1 a1 p2 a2 …pk-1 ak-1 (k) ak pk+1 ak+1 …p2k a2k • pkis inserted in parent.

Split (2k) a0 p1 a1 p2 a2 …p2k a2k Takes O(k) time (k-1) a0 p1 a1 p2 a2 …pk-1 ak-1 (k) ak pk ak+1 …p2k a2k • pkis inserted in parent.

Insert (summary) • O(logn) time and at most O(logkn) each split takes O(k) time • Can show that the amortized # of splits is O(1) per insert

Delete 5 14 3 7 9 15 30 30 40 50 14 1 2 3 4 7 8 9 16 17 5 6 delete(14,T).

Delete 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 16 17 5 6 delete(14,T).

Delete 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 16 17 5 6 delete(17,T).

Delete 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 16 5 6 delete(17,T).

Delete 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 16 5 6 delete(16,T).

Delete 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 5 6 delete(16,T).

Borrow 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 5 6 delete(16,T).

Borrow 5 9 7 3 30 30 40 50 1 2 3 4 7 8 9 5 6 delete(16,T).

5 9 7 3 30 30 40 50 1 2 3 4 7 8 9 5 6 delete(9,T).

5 9 7 3 30 30 40 50 1 2 3 4 7 8 5 6 delete(9,T).

Fusion 5 9 3 7 30 30 40 50 1 2 3 4 7 8 5 6 delete(9,T).

Fusion 5 3 7 30 30 40 50 1 2 3 4 7 8 5 6 delete(9,T).

Delete -- definition Remove the key. If it is the only key in the node remove the node, and let v be the parent that loses a child, otherwise return (*) If v has one child, and v is the root discard v. Otherwise (v is not a root), if v has a sibling w of degree 3 or 4, borrow a child from w to v and terminate. Otherwise, fuse v with its sibling to a degree 3 node and repeat (*) with the parent of v.

B trees

B trees

Presentation Transcript

B + -Trees

B-Trees

B+-trees

B-Trees

B+ Trees

B Trees

B-Trees

B-trees

B-Trees

B + -Trees

B+ Trees

B-Trees And B+-Trees

B-Trees

B-Trees

B-Trees

B-Trees

B-Trees

B-Trees

B-Trees

B-Trees

B+ Trees

B-Trees