210 likes | 224 Views
Learn about multiple-way search trees and B-trees optimized for external storage devices. Understand the structure, algorithms, and efficiency of these trees for handling large data sets. Discover how B-trees balance and merge nodes for optimal search, insert, and delete operations.
E N D
7.1 External Search • The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er) • Big data sets are frequently stored in secondary storage devices (hard disk). Slow(er) access (about 100-1000 times slower) Access: always to a complete block (page) of data (4096 bytes), which is stored in the RAM For efficiency: keep the number of accesses to the pages low!
For external search: a variant of search trees: 1 node = 1 page Multiple way search trees!
Definition (Multiple way-search trees) An empty tree is a multiple way search tree with an empty set of keys {} . Be T0, ..., Tn multiple way-search trees with keys taken from a common key set S, and be k1,...,kn a sequence of keys with k1 < ...< kn. Then is the sequence: T0 k1 T1 k2 T2 k3 .... kn Tn a multiple way-search trees only when: • for all keys x from T0 x < k1 • for i=1,...,n-1, for all keys x in Ti, ki < x < ki+1 • for all keys x from Tn kn < x
B-Tree Definition A B-Tree of Order m is a multiple way tree with the following characteristics • 1 #(keys in the root) 2m and m #(keys in the nodes) 2m for all other nodes. • All paths from the root to a leaf are equally long. • Each internal node (not leaf) which has s keys has exactly s+1 children. • 2-3 Trees is a particular case for m=1
Assessment of B-trees The minimal possible number of nodes in a B-tree of order m and height h: • Number of nodes in each sub-tree 1 + (m+1) + (m+1)2 + .... + (m+1)h-1 = ( (m+1)h – 1) / m. The root of the minimal tree has only one key and two children, all other nodes have m keys. Altogether: number of keys n in a B-tree of height h: n 2 (m+1)h– 1 Thus the following holds for each B-tree of height h with n keys: h logm+1 ((n+1)/2) .
Example The following holds for each B-tree of height h with n keys: h logm+1 ((n+1)/2). Example: for • Page size: 1 KByte and • each entry plus pointer: 8 bytes, If we chose m=63, and for an ammount of data of n= 1 000 000 We have h log 64 500 000.5 < 4 and with that hmax = 3.
Algorithms for searching keys in a B-tree Algorithm search(r, x) //search for key x in the tree having as root node r; //global variable p = pointer to last node visited in r, search for the first key y >= x or until no more keys if y == x {stop search, p = r, found} else if r a leaf {stop search, p = r, not found} else if not past last key search(pointer to node before y, x) else search(last pointer, x)
Algorithms for inserting and deleting of keys in a B-tree Algorithm insert (r, x) //insert key x in the tree having root r search for x in tree having root r; if x was not found { be p the leaf where the search stopped; insert x in the right position; if p now has 2m+1 keys {overflow(p)} }
Algorithm Split (1) Algorithm overflow (p) = split (p) Algorithm split (p) first case: p has a parent q. Divide the overflowed node. The key of the middle goes to the parent. remark: the splitting may go up until the root, in which case the height of the tree is incremented by one.
Algorithm Split (2) Algorithm split (p) second case: p is the root. Divide overflowed node. Open a new level above containing a new root with the key of the middle (root has one key).
Algorithm delete (r,x) //delete key x from tree having root r search for x in the tree with root r; if x found { if x is in an internal node { exchange x with the next bigger key x' in the tree // if x is in an internal node then there must // be at least one bigger number in the tree //this number is in a leaf ! } be p the leaf, containing x; erase x from p; if p is not in the root r { if p has m-1 keys {underflow (p)} } }
Algorithm underflow (p) if p has a neighboring node with s>m nodes { balance (p,p') } else // because p cannot be the root, p must have a neighbor with m keys { be p' the neighbor with m keys; merge (p,p')}
Algorithm balance (p, p') // balance node p with its neighbor p' (s > m , r =(m+s)/2 -m )
Algorithm merge (p,p') // merge node p with its neighbor perform the following operation: afterwards: if( q <> root) and (q has m-1 keys) underflow (q) else (if(q= root) and (q empty)) {free q let root point to p^}
Recursion If when performing underflow we have to perform merge, we might have to perform underflow again one level up This process might be repeated until the root.
Cost Be m the order of the B-tree, n the number of keys. Costs for search , insert and delete: O(h) = O(logm+1 ((n+1)/2) ) = O(logm+1(n)).
Remark: B-trees can also be used as internal storage structure: Especially: B-trees of order 1 (then only one or 2 keys in each node – no elaborate search inside the nodes). Cost of search, insert, delete: O(log n).
Remark: use of storage memory Over 50% reason: the condition: 1/2•k #(keys in the node) k For nodes root (k=2m)