90 likes | 359 Views
Preliminaries. Multiway trees have nodes with greater than two children. Multiway trees of order k have nodes with most k children 2-3-4 Trees For all non leaf nodes, Nodes with One data items have two pointers Two data items have three pointers Three data items have four pointers
E N D
Preliminaries • Multiway trees have nodes with greater than two children. Multiway trees of order k have nodes with most k children • 2-3-4 Trees • For all non leaf nodes, Nodes with • One data items have two pointers • Two data items have three pointers • Three data items have four pointers • Children of pointer p have keys less than data item p. • Children of the last pointer contains keys > than the last data item. • B-Trees (Balanced, Boeing, broad, bushy, or Bayer (for Rudolph Bayer)??) • Each node contains links to as many children as can fit in a disk block.
Node Structures • 2-3-4 tree typedefstructNodelink { intnumElems; Item *items[3]; structNodelink*links[4]; } Node; • B-Tree typedef struct Nodelink { Item[k] items; Nodelink[k+1] nodes; } Node;
2-3-4 Insertion Algorithm • Insert( node ) Ifnode is full ThenCall splitNode Ifkey is found in node, then Return “DuplicatesNotAllowed” If this is a leaf node, Insert the Data item and Return Call Insert(appropriateChildPointer) • SplitNode Allocate a newNode and add the right child to it If parent exists Then InsertmiddleChild to parent node and point to newNode Else Allocate new Root containing middleChild of noderoot’s firstChildPointer points to newNoderoot’s secondChildPointer points to node
2-3-4 Deletion Algorithm • Find the node to delete. If it is not a leaf node, replace its data by its successor, and then remove the successor. • Cases to consider when deleting an item from a 2-3-4 node: • If keys remain in a leaf node item being deleted, remove it and break • Demote the parent item to replace the item being deleted in its node • If there is a sibling with more than entry, then promote a sibling item to replace the parent item in the parent node. • If sibling nodes have only one entry, Merge a sibling into the current node and remove the sibling node. Remove the appropriate entry in the parent mode, possibly creating a hole. Recursively, work up the tree applying steps 1, 2, and 3 as needed. • If the root node becomes empty, simply remove it from the tree.
Visual Illustration of the 2-3-4-Delete Case 1: 11, 22, 33 11, 33 Case 2: 11, 22, 33 09, 22, 33 08, 09 12 08 11 Case 3: 11 08 12 08,11 The algorithm recursively works its way up the tree
Characteristics of External Storage • Speed is at least three orders of magnitude slower than memory. • The extra overhead of searching through multiway tree nodes is more than compensated because less tree depth means less disk access. • It is desirable to design the record sizes with disk block sizes in mind. Each disk read/write will be in multiples of its block size.
B-Tree Insertion Algorithm • Differences from the 2-3-4 algorithm • Node splitting is from the bottom up rather than the top down. • Advantage: The tree is kept more full. • Disadvantage: A tree down could be followed by a tree up if multiple splits are necessary. • Half of the items go to the new node, half remain in the old node. • The middle key is promoted to the next level up. • Contraction occurs when a node and a sibling have less than a full block of data items. Note: Standard B-tree implementations require at least half full nodes.
External Storage Optimizations • It is more efficient to keep the index and data separate • Separate indices allow for multi-keyed files • Refinements exist to guarantee that no record is less than 2/3 full. Nodes are balanced over three siblings. • Some implementations only have data pointers at the last level. • A linked list of free disk blocks is often used to reclaim storage space after deletions. • Efficiency: Assume a block contains 8096 bytes, each key is 24 bytes, the blocks are half full, and the pointers require 4 bytes. How many levels deep is the tree?
Other External Storage Algorithms • Create binary tree in memory for the index • Sorting external data with a type of merge sort • On Each pass • Read large block from each piece of the file • Perform merge • Write back to second file • Keep reading blocks from each half until they run out. • There will be logk N merges where k is the number of data elements that can fit in the memory blocks.