210 likes | 318 Views
Arboles B. 7.1 External Search. The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er) Big data sets are frequently stored in secondary storage devices (hard disk). Slow(er) access (about 100-1000 times slower)
E N D
7.1 External Search • The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er) • Big data sets are frequently stored in secondary storage devices (hard disk). Slow(er) access (about 100-1000 times slower) Access: always to a complete block (page) of data (4096 bytes), which is stored in the RAM For efficiency: keep the number of accesses to the pages low!
For external search: a variant of search trees: 1 node = 1 page Multiple way search trees!
Definition (Multiple way-search trees) An empty tree is a multiple way search tree with an empty set of keys {} . Be T0, ..., Tn multiple way-search trees with keys taken from a common key set S, and be k1,...,kn a sequence of keys with k1 < ...< kn. Then is the sequence: T0 k1 T1 k2 T2 k3 .... kn Tn a multiple way-search trees only when: • for all keys x from T0 x < k1 • for i=1,...,n-1, for all keys x in Ti, ki < x < ki+1 • for all keys x from Tn kn < x
B-Tree Definition A B-Tree of Order m is a multiple way tree with the following characteristics • 1 #(keys in the root) 2m and m #(keys in the nodes) 2m for all other nodes. • All paths from the root to a leaf are equally long. • Each internal node (not leaf) which has s keys has exactly s+1 children. • 2-3 Trees is a particular case for m=1
Assessment of B-trees The minimal possible number of nodes in a B-tree of order m and height h: • Number of nodes in each sub-tree 1 + (m+1) + (m+1)2 + .... + (m+1)h-1 = ( (m+1)h – 1) / m. The root of the minimal tree has only one key and two children, all other nodes have m keys. Altogether: number of keys n in a B-tree of height h: n 2 (m+1)h– 1 Thus the following holds for each B-tree of height h with n keys: h logm+1 ((n+1)/2) .
Example The following holds for each B-tree of height h with n keys: h logm+1 ((n+1)/2). Example: for • Page size: 1 KByte and • each entry plus pointer: 8 bytes, If we chose m=63, and for an ammount of data of n= 1 000 000 We have h log 64 500 000.5 < 4 and with that hmax = 3.
Algorithms for searching keys in a B-tree Algorithm search(r, x) //search for key x in the tree having as root node r; //global variable p = pointer to last node visited in r, search for the first key y >= x or until no more keys if y == x {stop search, p = r, found} else if r a leaf {stop search, p = r, not found} else if not past last key search(pointer to node before y, x) else search(last pointer, x)
Algorithms for inserting and deleting of keys in a B-tree Algorithm insert (r, x) //insert key x in the tree having root r search for x in tree having root r; if x was not found { be p the leaf where the search stopped; insert x in the right position; if p now has 2m+1 keys {overflow(p)} }
Algorithm Split (1) Algorithm overflow (p) = split (p) Algorithm split (p) first case: p has a parent q. Divide the overflowed node. The key of the middle goes to the parent. remark: the splitting may go up until the root, in which case the height of the tree is incremented by one.
Algorithm Split (2) Algorithm split (p) second case: p is the root. Divide overflowed node. Open a new level above containing a new root with the key of the middle (root has one key).
Algorithm delete (r,x) //delete key x from tree having root r search for x in the tree with root r; if x found { if x is in an internal node { exchange x with the next bigger key x' in the tree // if x is in an internal node then there must // be at least one bigger number in the tree //this number is in a leaf ! } be p the leaf, containing x; erase x from p; if p is not in the root r { if p has m-1 keys {underflow (p)} } }
Algorithm underflow (p) if p has a neighboring node with s>m nodes { balance (p,p') } else // because p cannot be the root, p must have a neighbor with m keys { be p' the neighbor with m keys; merge (p,p')}
Algorithm balance (p, p') // balance node p with its neighbor p' (s > m , r =(m+s)/2 -m )
Algorithm merge (p,p') // merge node p with its neighbor perform the following operation: afterwards: if( q <> root) and (q has m-1 keys) underflow (q) else (if(q= root) and (q empty)) {free q let root point to p^}
Recursion If when performing underflow we have to perform merge, we might have to perform underflow again one level up This process might be repeated until the root.
Cost Be m the order of the B-tree, n the number of keys. Costs for search , insert and delete: O(h) = O(logm+1 ((n+1)/2) ) = O(logm+1(n)).
Remark: B-trees can also be used as internal storage structure: Especially: B-trees of order 1 (then only one or 2 keys in each node – no elaborate search inside the nodes). Cost of search, insert, delete: O(log n).
Remark: use of storage memory Over 50% reason: the condition: 1/2•k #(keys in the node) k For nodes root (k=2m)