370 likes | 486 Views
Balanced Search Trees. 15-211 Fundamental Data Structures and Algorithms. Margaret Reid-Miller 3 February 2005. Plan. Today 2-3-4 trees Red-Black trees Reading: For today: Chapters 13.3-4 Reminder: HW1 due tonight!!! HW2 will be available soon.
E N D
Balanced Search Trees 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 3 February 2005
Plan • Today • 2-3-4 trees • Red-Black trees • Reading: • For today: Chapters 13.3-4 • Reminder: HW1 due tonight!!! HW2 will be available soon
5 5 6 3 3 2 2 7 7 8 7 5 4 5 2 1 1 8 4 4 4 6 6 9 9 3 AVL-Trees What is the key restriction on a binary search tree that keeps an AVL tree balanced? OK not OK
AVL-Trees • Height balanced: • For each node the heights of left and right subtrees differ by at most 1, a representational invariance. • What is the mechanism to rebalance an out-of-balanced AVL tree caused by an insert?
X Y Z The single rotation • Rotate the deepest out-of-balanced node. “Pulls” the child up one level. Z X Y
The double rotation • First rotate around child node, then around the parent node. Z Z X Y2 Y1 Y2 X Y1
Double rotation cont’d • Result is to “pull” the grandchild node up two levels. Z X X Y1 Y2 Z Y1 Y2
AVL Tree Summary • In each node maintains a lazy deletion flag and the height of its subtree. • The height of an AVL tree is at most 45% greater than the minimum. • Requires at most one single or double rotation to regain balance after an insert. • Thus, guarantees O(log N) time for search and insert.
Balanced 2-3-4 Trees • Maintain height balance in all subtrees. Depth property. • But allow nodes in the tree to expand to accommodate inserts. • In particular, nodes can have 2, 3 or 4 children. Node-size property. • E.g., a 4-node would have 3 keys that splits the keys into 4 intervals.
2-3-4 tree search • Search is similar to a binary search. • E.g., search for B G M Q A C H R S W
G M Q A C H R S W 2-3-4 tree search • Search is similar to a binary search. • E.g., search for B
G M Q A C H O S U W 2-3-4 Tree Insert • To insert, first search for a leaf node in which to put the key. • E.g., insert U G M Q A C H R S W
H S U W A C H 2-3-4 Tree Insert • May need to split a node • E.g., insert T G Q T A C G Q U S T W
2-3-4 Tree Insert /* Either returns an empty node or a new root */ public Node BUinsert(int key) { if isEmptyNode() return new Node(key); /* Search for leaf to put key into */ Node subtree = findChild(key); // down which link? Node upNode = child.BUinsert(key); /* upNode is empty, the key at a leaf node, or * the result of a 4-node split that needs to be * propagated up. */ if upNode.isEmptyNode() return upNode; else return addToNode(upNode); // split? }
Cascading splits • When inserting a key into a 4-node, the 4-node splits and a key moves up to the parent node. • This new key may in turn cause the parent to split, moving a key up to the grandparent, and so on up to the root. • When would this happen? • Is there a way to avoid these cascading splits?
Bottom-up 2-3-4 trees • This BUinsert is called a bottom-up version of insert, since splits occur as we go back up the tree after the recursive calls. • Work occurs before and after the recursive calls.
Preemptive Split • Every time we find a 4-node while traveling down a search path, we split the 4-node. • Note: Two 2-nodes have the same number of children as one 4-node. • Changes are local to the split node (no cascading). • Guaranteed to find a 2-node or 3-node at the leaf. • Splitting a root node creates a new root.
2-3-4 Tree Height • What is the height of the tree? At most log2 N + 1 • Why? The maximum depth is when every node is a 2-node. Since every leaf has the same depth, the tree is complete and has depth log2 N + 1.
Number of splits • How many splits does an insertion require? At most log2 N + 1 splits. • Seems to require less than one split on average when tree is built from a random permutation. Trees tend to have few 4-nodes.
Top-down 2-4-5 trees • The second method is called top-down as splits occur on the way down the tree. • All the work occurs before the recursive calls and no work occurs after the recursive calls. • Called tail-recursion, which is much more efficient. • Can AVL trees be made tail recursive?
2-3-4 trees • Advantages: • Guaranteed O(log N) time for search and insert. • Issues: • Awkward to maintain three types of nodes. • Need to modify the standard search on binary trees. • Splits need to move links between nodes. • Code has many cases to handle.
G B F H D I G Red-Black trees • A red-black tree is binary tree representation of a 2-3-4 tree using red and black nodes. I D F OR D I B H
Red-black tree properties A Red-Black tree is a binary search tree where • Every node is colored either red or black. • Note: Every 2-3-4 node corresponds to one black node. • The root node is black. • Red nodes always have black parents (children) • Every path from the root to a leaf has same number of black nodes.
7 3 6 9 Red-black tree height 5 • What is the height of a red-black tree? • It is at most 2 log N + 2 since it can be at most twice as high as its corresponding 2-3-4 tree, which has height at most log N + 1.
Red-black Tree Search • Search is the same as for binary search trees. • Color is irrelevant. • Search guaranteed to take O(log N) time. • Search typically occurs more frequently than insert.
Red-black Tree Insert • Simple 4-node test (2 red children?) • Few splits as most 4-nodes tend to be near the leaves. • Some 4-node splits require only changing the color of three nodes. • Rotations needed only when a 4-node has a 3-node parent.
Red-black Tree Summary • Advantages: • Guaranteed O(log N) time for search and insert. • Little overhead for balancing. • Trees are nearly optimal. • Top-down implementation can be made tail-recursive, so very efficient.
B-trees • A generalization of 2-3-4 trees. • Used for very large dictionaries where the data are maintained on disks. • Since disk lookups are very SLOW, want to read as few disk pages as possible. Want really shallow depth trees!
B-trees Key Idea • Make the nodes in the trees have a huge number of links, k-way. • Typically choose k so that a node fills a disk page. • As with 2-3-4 trees, not all the nodes have k links. Some may have as few as k/2 links. • When a node overflows, split the node.
B-trees • Takes O(log k/2 N) probes for search and insert. • Typically about 2-3 probes (disk accesses) • E.g., for N < 125 million and k = 1000, the height of the tree is less than 3. • As all searches go through the root node, usually keep the root node in memory. • Many variants • Common in many large data base systems.
Conclusion • AVL trees have the disadvantage that insert is not tail recursive. • 2-3-4 trees are not practical, but are a good way to think about other approaches. • Red-black trees are very efficient and have guaranteed O(log N) insert and search. • B-trees have very shallow depth to minimize the number of disk reads needed for huge data bases.