CS 332: Algorithms

CS 332: Algorithms Augmenting Data Structures David Luebke 18/9/2014

Administrivia • Midterm is postponed until Thursday, Oct 26 • Reminder: homework 3 due today • In the CS front office • Due at 5 PM (but don’t risk being there at 4:59!) • Check your e-mail for some clarifications & hints David Luebke 28/9/2014

Review: Hash Tables • More formally: • Given a table T and a record x, with key (= symbol) and satellite data, we need to support: • Insert (T, x) • Delete (T, x) • Search(T, x) • Don’t care about sorting the records • Hash tables support all the above in O(1) expected time David Luebke 38/9/2014

Review: Direct Addressing • Suppose: • The range of keys is 0..m-1 • Keys are distinct • The idea: • Use key itself as the address into the table • Set up an array T[0..m-1] in which • T[i] = x if x T and key[x] = i • T[i] = NULL otherwise • This is called a direct-address table David Luebke 48/9/2014

Review: Hash Functions • Next problem: collision T 0 U(universe of keys) h(k1) k1 h(k4) k4 K(actualkeys) k5 h(k2) = h(k5) k2 h(k3) k3 m - 1 David Luebke 58/9/2014

Review: Resolving Collisions • How can we solve the problem of collisions? • Open addressing • To insert: if slot is full, try another slot, and another, until an open slot is found (probing) • To search, follow same sequence of probes as would be used when inserting the element • Chaining • Keep linked list of elements in slots • Upon collision, just add new element to list David Luebke 68/9/2014

Review: Chaining • Chaining puts elements that hash to the same slot in a linked list: T —— U(universe of keys) k1 k4 —— —— k1 —— k4 K(actualkeys) k5 —— k7 k5 k2 k7 —— —— k3 k2 k3 —— k8 k6 k8 k6 —— —— David Luebke 78/9/2014

Review: Analysis Of Hash Tables • Simple uniform hashing: each key in table is equally likely to be hashed to any slot • Load factor = n/m = average # keys per slot • Average cost of unsuccessful search = O(1+α) • Successful search: O(1+ α/2) = O(1+ α) • If n is proportional to m, α = O(1) • So the cost of searching = O(1) if we size our table appropriately David Luebke 88/9/2014

Review: Choosing A Hash Function • Choosing the hash function well is crucial • Bad hash function puts all elements in same slot • A good hash function: • Should distribute keys uniformly into slots • Should not depend on patterns in the data • We discussed three methods: • Division method • Multiplication method • Universal hashing David Luebke 98/9/2014

Review: The Division Method • h(k) = k mod m • In words: hash k into a table with m slots using the slot given by the remainder of k divided by m • Elements with adjacent keys hashed to different slots: good • If keys bear relation to m: bad • Upshot: pick table size m = prime number not too close to a power of 2 (or 10) David Luebke 108/9/2014

Review: The Multiplication Method • For a constant A, 0 < A < 1: • h(k) =  m (kA - kA)  • Upshot: • Choose m = 2P • Choose A not too close to 0 or 1 • Knuth: Good choice for A = (5 - 1)/2 Fractional part of kA David Luebke 118/9/2014

Review: Universal Hashing • When attempting to foil an malicious adversary, randomize the algorithm • Universal hashing: pick a hash function randomly when the algorithm begins (not upon every insert!) • Guarantees good performance on average, no matter what keys adversary chooses • Need a family of hash functions to choose from David Luebke 128/9/2014

Review: Universal Hashing • Let  be a (finite) collection of hash functions • …that map a given universe U of keys… • …into the range {0, 1, …, m - 1}. • If  is universal if: • for each pair of distinct keys x, y  U,the number of hash functions h  for which h(x) = h(y) is ||/m • In other words: • With a random hash function from , the chance of a collision between x and y (x  y)is exactly 1/m David Luebke 138/9/2014

Review: A Universal Hash Function • Choose table size m to be prime • Decompose key x into r+1 bytes, so that x = {x0, x1, …, xr} • Only requirement is that max value of byte < m • Let a = {a0, a1, …, ar} denote a sequence of r+1 elements chosen randomly from {0, 1, …, m - 1} • Define corresponding hash function ha : • With this definition,  has mr+1 members David Luebke 148/9/2014

Augmenting Data Structures • This course is supposed to be about design and analysis of algorithms • So far, we’ve only looked at one design technique (What is it?) David Luebke 158/9/2014

Augmenting Data Structures • This course is supposed to be about design and analysis of algorithms • So far, we’ve only looked at one design technique: divide and conquer • Next up: augmenting data structures • Or, “One good thief is worth ten good scholars” David Luebke 168/9/2014

Dynamic Order Statistics • We’ve seen algorithms for finding the ith element of an unordered set in O(n) time • Next, a structure to support finding the ith element of a dynamic set in O(lg n) time • What operations do dynamic sets usually support? • What structure works well for these? • How could we use this structure for order statistics? • How might we augment it to support efficient extraction of order statistics? David Luebke 178/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 Order Statistic Trees • OS Trees augment red-black trees: • Associate a size field with each node in the tree • x->size records the size of subtree rooted at x, including x itself: David Luebke 188/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 Selection On OS Trees How can we use this property to select the ith element of the set? David Luebke 198/9/2014

OS-Select OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } David Luebke 208/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 OS-Select Example • Example: show OS-Select(root, 5): OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } David Luebke 218/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 i = 5r = 6 OS-Select Example • Example: show OS-Select(root, 5): OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } David Luebke 228/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 i = 5r = 6 OS-Select Example • Example: show OS-Select(root, 5): OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5r = 2 David Luebke 238/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 i = 5r = 6 OS-Select Example • Example: show OS-Select(root, 5): OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5r = 2 i = 3r = 2 David Luebke 248/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 i = 5r = 6 OS-Select Example • Example: show OS-Select(root, 5): OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5r = 2 i = 3r = 2 i = 1r = 1 David Luebke 258/9/2014

OS-Select: A Subtlety OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } • What happens at the leaves? • How can we deal elegantly with this? David Luebke 268/9/2014

OS-Select OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } • What will be the running time? David Luebke 278/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 Determining The Rank Of An Element What is the rank of this element? David Luebke 288/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 Determining The Rank Of An Element Of this one? Why? David Luebke 298/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 Determining The Rank Of An Element Of the root? What’s the pattern here? David Luebke 308/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 Determining The Rank Of An Element What about the rank of this element? David Luebke 318/9/2014

M8 C5 P2 Q1 A1 F3 D1 H1 Determining The Rank Of An Element This one? What’s the pattern here? David Luebke 328/9/2014

OS-Rank OS-Rank(T, x) { r = x->left->size + 1; y = x; while (y != T->root) if (y == y->p->right) r = r + y->p->left->size + 1; y = y->p; return r; } • What will be the running time? David Luebke 338/9/2014

OS-Trees: Maintaining Sizes • So we’ve shown that with subtree sizes, order statistic operations can be done in O(lg n) time • Next step: maintain sizes during Insert() and Delete() operations • How would we adjust the size fields during insertion on a plain binary search tree? David Luebke 348/9/2014

OS-Trees: Maintaining Sizes • So we’ve shown that with subtree sizes, order statistic operations can be done in O(lg n) time • Next step: maintain sizes during Insert() and Delete() operations • How would we adjust the size fields during insertion on a plain binary search tree? • A: increment sizes of nodes traversed during search David Luebke 358/9/2014

OS-Trees: Maintaining Sizes • So we’ve shown that with subtree sizes, order statistic operations can be done in O(lg n) time • Next step: maintain sizes during Insert() and Delete() operations • How would we adjust the size fields during insertion on a plain binary search tree? • A: increment sizes of nodes traversed during search • Why won’t this work on red-black trees? David Luebke 368/9/2014

Maintaining Size Through Rotation • Salient point: rotation invalidates only x and y • Can recalculate their sizes in constant time • Why? y19 x19 rightRotate(y) x11 y12 7 6 leftRotate(x) 6 4 4 7 David Luebke 378/9/2014

Augmenting Data Structures: Methodology • Choose underlying data structure • E.g., red-black trees • Determine additional information to maintain • E.g., subtree sizes • Verify that information can be maintained for operations that modify the structure • E.g., Insert(), Delete() (don’t forget rotations!) • Develop new operations • E.g., OS-Rank(), OS-Select() David Luebke 388/9/2014

The End • Up next: • Interval trees • Review for midterm David Luebke 398/9/2014

CS 332: Algorithms