310 likes | 476 Views
CS 255: Database System Principles slides: B-trees. By:- Arunesh Joshi Id:-006538558. Agenda. The features and different functionalities of B- Tree in terms of index structure The Structure of B-Trees Applications of B-Trees Lookup in B-Trees Range Queries Insertion into B-Trees
E N D
CS 255: Database System Principlesslides: B-trees By:- Arunesh Joshi Id:-006538558
Agenda • The features and different functionalities of B- Tree in terms of index structure • The Structure of B-Trees • Applications of B-Trees • Lookup in B-Trees • Range Queries • Insertion into B-Trees • Deletion from a B-Tree • Efficiency of B-Trees
B-Trees B-tree organizes its blocks into a tree. The tree is balanced, meaning that all paths from the root to a leaf have the same length. Typically, there are three layers in a B-tree: the root, an intermediate layer, and leaves, but any number of layers is possible.
functionalities of B- Tree • B-Trees automatically maintain as many levels of index as is appropriate for the size of the file being indexed. • B-Trees manage the space on the blocks they use so that every block is between half used and completely full. No overflow blocks are needed.
Structure of B-Trees • There are three layers in binary trees- the root, an intermediate layer and leaves • In a B-Tree each block have space for nsearch-key values and n+1 pointers • [next slide explains the structure of a B-Tree]
B-Tree Example n=3 Root 100 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35
Sample non-leaf 57 81 95 to keys to keys to keys to keys to keys < 57 57 k<81 81k<95 95
Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85
In textbook’s notation n=3 Leaf: Non-leaf: 30 35 30 35 30 30
Size of nodes: n+1 pointers n keys (fixed)
Don’t want nodes to be too empty • Use at least • Non-leaf: (n+1)/2 pointers • Leaf: (n+1)/2 pointers to data
n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null
B-tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer”
Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2- 1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1
Applications of B-trees 1. The search key of the B-tree is the primary key for the data file, and the index is dense. That is, there is one key-pointer pair in a leaf for every record of the data file. The data file may or may not be sorted by primary key. 2. The data file is sorted by its primary key, and the B-tree is a sparse index with one key-pointer pair at a leaf for each block of the data file. 3. The data file is sorted by an attribute that is not a key, and this attribute is the search key for the B-tree. For each key value K that appears in the data file there is one key-pointer pair at a leaf. That pointer goes to the first of the records that have K as their sort-key value.
Lookup in B-Trees • Suppose we want to find a record with search key 40. • We will start at the root , the root is 13, so the record will go the right of the tree. • Then keep searching with the same concept.
13 7 23 31 43 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 Looking for block “40”<not present>
Range Queries • B-trees are used for queries in which a range of values are asked for. Like, SELECT * FROM R WHERE R. k >= 10 AND R. k <= 25;
Insert into B-tree • (a) simple case • space available in leaf • (b) leaf overflow • (c) non-leaf overflow • (d) new root
32 n=3 100 (a) Insert key = 32 30 3 5 11 30 31
7 3 5 7 n=3 100 (a) Insert key = 7 30 3 5 11 30 31
160 180 160 179 n=3 100 (c) Insert key = 160 120 150 180 180 200 150 156 179
30 new root 40 40 45 n=3 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40
Deletion from B-tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Notes 4
40 n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50
35 35 n=4 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50
new root 40 25 30 • (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45
B-tree deletions in practice • Often, coalescing is not implemented • Too hard and not worth it!
Why we take 3 as the number of levels of a B-tree? Suppose our blocks are 4096 bytes. Also let keys be integers of 4 bytes and let pointers be 8 bytes. If there is no header information kept on the blocks, then we want to find the largest integer value of n such that - 411 + 8(n + 1) 5 4096. That value is n = 340. 340 key-pointer pairs could fit in one block for our example data. Suppose that the average block has an occupancy midway between the minimum and maximum. i.e.. a typical block has 255 pointers. With a root 255 childrenand 255*255= 65023 leaves. We shall have among those leaves cube of 253. or about 16.6 million pointers to records. That is, files with up to 16.6 million records can be accommodated by a 3-level B-tree.