160 likes | 303 Views
Lecture 11 : B-Tree. Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University. Lecture notes : courtesy of David Matuszek. Binary Search Tree (BST) : Problem. Consider disk access for BST Disk access is much slower than memory access Disk access
E N D
Lecture 11 : B-Tree Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy of David Matuszek
Binary Search Tree (BST) : Problem • Consider disk access for BST • Disk access is much slower than memory access • Disk access • Seek time >> rotational delay > transfer time • Reducing seek time significantly affects overall performance • If we adopt trivial method for storing BST in a disk, • Each visit to a child node involves one disk access. • That is inefficient. • We want to reduce height of BST by using multiway search tree
m-way search tree • A non-empty node has M subtrees (2<=M<=m) • Therefore, has M-1 keys(elements) • The values in a node are stored in ascending order, V1 < V2 < ... Vk (k <= M-1) • subtrees are placed between adjacent values, with one additional subtree at each end. • We can thus associate with each value a `left' and `right' subtree • the right subtree of Vi is the same as the left subtree of V(i+1). • All the values in V1's left subtree are less than V1 ; all the values in Vk's subtree are greater than Vk; and all the values in the subtree between V(i) and V(i+1) are greater than V(i) and less than V(i+1).
B-Tree • B-Tree of order m has following property • m-way search tree • Keys in a node are in increasing order. • The root node (if not a leaf node) has at least two children • All nodes other than the root node have at least [m/2] keys. (how many children?) • All external nodes are at the same level • Mostly used in Database systems
B-Tree • a variation on binary search trees that allow quick searching in files on disk • Instead of storing one key and having two children, B-tree nodes have (up to) n keys and n+1 children, where n can be large • This shortens the tree (in terms of height) and requires much less disk access than a binary search tree would • Algorithm is complex and requires more computation. But computation is much cheaper than disk acces
Disk Access • Platter • Track • Sector (typical size : 512B) • Block : read/write unit , several consecutive sectors • Store related data into one block • Locality??? • B-Tree utilize (spatial) locality
B-Tree • B-tree nodes have a variable number of keys and children, subject to some constraints. • In many respects, they work just like binary search trees, but are considerably "fatter."
B-Tree • Every node has the following fields: • x.n, the number of keys currently in node x. For example, |40|50|.n in the above example B-tree is 2. |70|80|90|.n is 3. • The x.n keys themselves, stored in nondecreasing order: x.key[1] <= x.key[2] <= ... <= x.key[x.n] For example, the keys in |70|80|90| are ordered. • x.leaf, a boolean value that is True if x is a leaf and False if x is an internal node. • If x is an internal node, it contains x.n+1 pointers c[1], c[2], ... , x.c[n], x.c[n+1] to its children. • Leaf nodes have no children so their c[i] fields are undefined.
B-Tree • The keys x.key[i] separate the ranges of keys stored in each subtree: if k[i] is any key stored in the subtree with root x.c[i], then k[1] <= x.key[1] <= k[2] <= x.key[2] <= ... <= x.key[x.n] <= k[x.n+1]. • Every leaf has the same depth, which is the tree's height h.
B-Tree Search • Perform Just like Binary Search Tree.
Insert value X into a B-tree 1. using the SEARCH procedure for M-way trees (described above) find the leaf node to which X should be added 2. add X to this node in the appropriate place among the values already there 3. if there are M-1 or fewer values in the node after adding X, then we are finished 4. If there are M nodes after adding X, we say the node has overflowed
When overflowed during insertion • Left: the first (M-1)/2 values Middle: the middle value (position 1+((M-1)/2) Right: the last (M-1)/2 values • Notice that Left and Right have just enough values to be made into individual nodes. That's what we do... they become the left and right children of Middle, which we add in the appropriate place in this node's parent. • what if there is no room in the parent? If it overflows we do the same thing again: split it into Left-Middle-Right, make Left and Right into new nodes and add Middle (with Left and Right as its children) to the node above. • We continue doing this until no overflow occurs, or until the root itself overflows. If the root overflows, we split it, as usual, and create a new root node with Middle as its only value and Left and Right as its children (as usual).
Example : Insert 17, 6, 21, 67 17 6 67 21
B-Tree Deletion • Not covered here.
B-Tree Summary • B-Tree • Perfectly balanced • Every leaf node is at the same depth • Every node except root node is at least half full • Rebalancing is not so frequent • Reduced disk accesses when tree is stored in disks • Make the size of one node be one or more disk blocks to improve efficiency of disk accesses. • B-Tree height : • search/insert/delete : O(log N) [amortized]