Efficient Binary Search Trees: Algorithms and Properties

CSE 326: Data StructuresLecture #8Binary Search Trees Alon Halevy Spring Quarter 2001

Binary Trees A • Many algorithms are efficient and easy to program for the special case of binary trees • Binary tree is • a root • left subtree (maybe empty) • right subtree (maybe empty) B C D E F G H I J

Search tree property all keys in left subtree smaller than root’s key all keys in right subtree larger than root’s key result: easy to find any given key inserts/deletes by changing links Binary Search Tree Dictionary Data Structure 8 5 11 2 6 10 12 4 7 9 14 13

Example and Counter-Example 5 8 4 8 5 18 1 7 11 2 6 10 11 7 3 4 BINARY SEARCH TREE NOT A BINARY SEARCH TREE

In Order Listing visit left subtree visit node visit right subtree 10 5 15 2 9 20 17 7 30 In order listing: 25791015172030

Finding a Node Node *& find(Comparable x, Node * root) { if (root == NULL) return root; else if (x < root->key) return find(x, root->left); else if (x > root->key) return find(x, root->right); else return root; } 10 5 15 2 9 20 17 7 30 runtime:

Insert Concept:proceed down tree as in Find; if new key not found, then insert a new node at last spot traversed void insert(Comparable x, Node * root) { assert ( root != NULL ); if (x < root->key){ if (root->left == NULL) root->left = new Node(x); else insert( x, root->left ); } else if (x > root->key){ if (root->right == NULL) root->right = new Node(x); else insert( x, root->right ); } }

BuildTree for BSTs Suppose a1, a2, …, an are inserted into an initially empty BST: • a1, a2, …, an are in increasing order • a1, a2, …, an are in decreasing order • a1 is the median of all, a2 is the median of elements less than a1, a3 is the median of elements greater than a1, etc. • data is randomly ordered

Examples of Building from Scratch • 1, 2, 3, 4, 5, 6, 7, 8, 9 • 5, 3, 7, 2, 4, 6, 8, 1, 9

Analysis of BuildTree • Worst case is O(n2) 1 + 2 + 3 + … + n = O(n2) • Average case assuming all orderings equally likely is O(n log n) • not averaging over all binary trees, rather averaging over all input sequences (inserts) • equivalently: average depth of a node is log n • proof: see Introduction to Algorithms, Cormen, Leiserson, & Rivest

Find minimum Findmaximum Bonus: FindMin/FindMax 10 5 15 2 9 20 17 7 30

Deletion 10 5 15 2 9 20 17 7 30 Why might deletion be harder than insertion?

Deletion - Leaf Case Delete(17) 10 5 15 2 9 20 17 7 30

Deletion - One Child Case Delete(15) 10 5 15 2 9 20 7 30

Deletion - Two Child Case Delete(5) 10 5 20 2 9 30 7 replace node with value guaranteed to be between the left and right subtrees: the successor Could we have used the predecessor instead?

Finding the Successor Find the next larger node in this node’s subtree. • not next larger in entire tree Node * succ(Node * root) { if (root->right == NULL) return NULL; else return min(root->right); } 10 5 15 2 9 20 17 7 30 How many children can the successor of a node have?

Predecessor Find the next smaller node in this node’s subtree. Node * pred(Node * root) { if (root->left == NULL) return NULL; else return max(root->left); } 10 5 15 2 9 20 17 7 30

Deletion - Two Child Case Delete(5) 10 5 20 2 9 30 7 always easy to delete the successor – always has either 0 or 1 children!

Delete Code void delete(Comparable x, Node *& p) { Node * q; if (p != NULL) { if (p->key < x) delete(x, p->right); else if (p->key > x) delete(x, p->left); else { /* p->key == x */ if (p->left == NULL) p = p->right; else if (p->right == NULL) p = p->left; else { q = successor(p); p->key = q->key; delete(q->key, p->right); } } } }

Lazy Deletion • Instead of physically deleting nodes, just mark them as deleted • simpler • physical deletions done in batches • some adds just flip deleted flag • extra memory for deleted flag • many lazy deletions slow finds • some operations may have to be modified (e.g., min and max) 10 5 15 2 9 20 17 7 30

Lazy Deletion Delete(17) Delete(15) Delete(5) Find(9) Find(16) Insert(5) Find(17) 10 5 15 2 9 20 17 7 30

Dictionary Implementations BST’s looking good for shallow trees, i.e. the depth D is small (log n), otherwise as bad as a linked list!

Beauty is Only (log n) Deep • Binary Search Trees are fast if they’re shallow: • e.g.: perfectly complete • e.g.: perfectly complete except the “fringe” (leafs) • any other good cases? Problems occur when one branch is much longer than the other! What matters here?

Balance t • Balance • height(left subtree) - height(right subtree) • zero everywhereperfectly balanced • small everywherebalanced enough 5 7 Balance between -1 and 1 everywhere maximum height of 1.44 log n

Binary search tree properties binary tree property search tree property Balance property balance of every node is: -1b 1 result: depth is (log n) AVL Tree Dictionary Data Structure 8 5 11 2 6 10 12 4 7 9 13 14 15

An AVL Tree 10 data 3 3 height 10 children 1 2 5 15 0 0 1 0 12 20 2 9 0 0 17 30

Not AVL Trees 3 2 10 10 0 2 0-2 = -2 (-1)-1 = -2 1 5 15 15 0 1 0 12 20 20 0 0 17 30

Staying Balanced Good case: inserting small, tall and middle. Insert(middle) Insert(small) Insert(tall) 1 M 0 0 S T

Bad Case #1 Insert(small) Insert(middle) Insert(tall) 2 S 1 M 0 T

Single Rotation 2 1 S M 1 M 0 0 S T 0 T Basic operation used in AVL trees: A right child could legally have its parent as its left child.

General Case: Insert Unbalances h + 1 h + 2 a a h - 1 h + 1 h - 1 h b X b X h h-1 h - 1 h - 1 Z Y Z Y

b a Z Y X General Single Rotation h + 1 h + 2 a • Height of left subtree same as it was before insert! • Height of all ancestors unchanged • We can stop here! h h - 1 h + 1 b X h h - 1 h - 1 h h - 1 Z Y

Bad Case #2 Insert(small) Insert(tall) Insert(middle) 2 S 1 T Will a single rotation fix this? 0 M

Double Rotation 2 2 S S 1 M 1 1 M T 0 0 0 S T 0 T M

General Double Rotation h + 2 a h + 1 h + 1 c h - 1 b Z h h b a h - 1 W h h - 1 h - 1 c X Y W Z X Y h - 1? h - 1? • Initially: insert into either X or Y unbalances tree (root height goes to h+2) • “Zig zag” to pull up c – restores root height to h+1, left subtree height to h

Insert Algorithm • Find spot for value • Hang new node • Search back up looking for imbalance • If there is an imbalance: case #1: Perform single rotation and exit case #2: Perform double rotation and exit

Easy Insert 3 Insert(3) 10 1 2 5 15 0 0 1 0 12 2 9 20 0 0 17 30

3 2 2 1 0 0 1 0 0 0 Hard Insert (Bad Case #1) Insert(33) 10 5 15 12 2 9 20 3 17 30

3 3 2 2 3 2 1 1 0 0 0 1 2 1 0 0 0 1 0 0 Single Rotation 10 10 5 15 5 20 12 15 2 9 20 2 9 30 0 12 17 3 17 30 3 33 0 33

3 2 2 1 0 0 1 0 0 0 Hard Insert (Bad Case #2) Insert(18) 10 5 15 12 2 9 20 3 17 30

3 3 2 2 3 3 1 1 0 0 0 2 2 0 0 0 0 1 1 Single Rotation (oops!) 10 10 5 15 5 20 12 15 2 9 20 2 9 30 0 12 17 3 17 30 3 0 0 18 18

3 3 2 2 3 3 1 1 0 0 0 0 2 2 0 0 1 0 1 Double Rotation (Step #1) 10 10 5 15 5 15 12 12 2 9 20 2 9 17 3 17 30 3 20 0 0 0 18 18 30 Look familiar?

3 3 2 2 2 3 1 1 0 0 1 0 1 2 0 0 0 1 0 Double Rotation (Step #2) 10 10 5 15 5 17 12 15 2 9 17 2 9 20 0 12 18 3 20 3 30 0 0 18 30

Recursive 1. Search downward for spot 2. Insert node 3. Unwind stack, correcting heights a. If imbalance #1, single rotate b. If imbalance #2, double rotate Iterative 1. Search downward for spot, stacking parent nodes 2. Insert node 3. Unwind stack, correcting heights a. If imbalance #1, single rotate and exit b. If imbalance #2, double rotate and exit AVL Algorithm Revisited

X Y Z Single Rotation Code root temp void RotateRight(Node *& root) { Node * temp = root->right; root->right = temp->left; temp->left = root; root->height = max(root->right->height, root->left->height) + 1; temp->height = max(temp->right->height, temp->left->height) + 1; root = temp; }

a a b Z c Z W Y c b X Y X W Double Rotation Code void DoubleRotateRight(Node *& root) { RotateLeft(root->right); RotateRight(root); } First Rotation

a c Z Y b X W Double Rotation Completed Second Rotation First Rotation c b a X W Y Z

Efficient Binary Search Trees: Algorithms and Properties