1 / 39

CS 332: Algorithms

CS 332: Algorithms. Augmenting Data Structures. Administrivia. Midterm is postponed until Thursday, Oct 26 Reminder: homework 3 due today In the CS front office Due at 5 PM (but don’t risk being there at 4:59!) Check your e-mail for some clarifications & hints. Review: Hash Tables.

darren
Download Presentation

CS 332: Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 332: Algorithms Augmenting Data Structures David Luebke 18/9/2014

  2. Administrivia • Midterm is postponed until Thursday, Oct 26 • Reminder: homework 3 due today • In the CS front office • Due at 5 PM (but don’t risk being there at 4:59!) • Check your e-mail for some clarifications & hints David Luebke 28/9/2014

  3. Review: Hash Tables • More formally: • Given a table T and a record x, with key (= symbol) and satellite data, we need to support: • Insert (T, x) • Delete (T, x) • Search(T, x) • Don’t care about sorting the records • Hash tables support all the above in O(1) expected time David Luebke 38/9/2014

  4. Review: Direct Addressing • Suppose: • The range of keys is 0..m-1 • Keys are distinct • The idea: • Use key itself as the address into the table • Set up an array T[0..m-1] in which • T[i] = x if x T and key[x] = i • T[i] = NULL otherwise • This is called a direct-address table David Luebke 48/9/2014

  5. Review: Hash Functions • Next problem: collision T 0 U(universe of keys) h(k1) k1 h(k4) k4 K(actualkeys) k5 h(k2) = h(k5) k2 h(k3) k3 m - 1 David Luebke 58/9/2014

  6. Review: Resolving Collisions • How can we solve the problem of collisions? • Open addressing • To insert: if slot is full, try another slot, and another, until an open slot is found (probing) • To search, follow same sequence of probes as would be used when inserting the element • Chaining • Keep linked list of elements in slots • Upon collision, just add new element to list David Luebke 68/9/2014

  7. Review: Chaining • Chaining puts elements that hash to the same slot in a linked list: T —— U(universe of keys) k1 k4 —— —— k1 —— k4 K(actualkeys) k5 —— k7 k5 k2 k7 —— —— k3 k2 k3 —— k8 k6 k8 k6 —— —— David Luebke 78/9/2014

  8. Review: Analysis Of Hash Tables • Simple uniform hashing: each key in table is equally likely to be hashed to any slot • Load factor = n/m = average # keys per slot • Average cost of unsuccessful search = O(1+α) • Successful search: O(1+ α/2) = O(1+ α) • If n is proportional to m, α = O(1) • So the cost of searching = O(1) if we size our table appropriately David Luebke 88/9/2014

  9. Review: Choosing A Hash Function • Choosing the hash function well is crucial • Bad hash function puts all elements in same slot • A good hash function: • Should distribute keys uniformly into slots • Should not depend on patterns in the data • We discussed three methods: • Division method • Multiplication method • Universal hashing David Luebke 98/9/2014

  10. Review: The Division Method • h(k) = k mod m • In words: hash k into a table with m slots using the slot given by the remainder of k divided by m • Elements with adjacent keys hashed to different slots: good • If keys bear relation to m: bad • Upshot: pick table size m = prime number not too close to a power of 2 (or 10) David Luebke 108/9/2014

  11. Review: The Multiplication Method • For a constant A, 0 < A < 1: • h(k) =  m (kA - kA)  • Upshot: • Choose m = 2P • Choose A not too close to 0 or 1 • Knuth: Good choice for A = (5 - 1)/2 Fractional part of kA David Luebke 118/9/2014

  12. Review: Universal Hashing • When attempting to foil an malicious adversary, randomize the algorithm • Universal hashing: pick a hash function randomly when the algorithm begins (not upon every insert!) • Guarantees good performance on average, no matter what keys adversary chooses • Need a family of hash functions to choose from David Luebke 128/9/2014

  13. Review: Universal Hashing • Let  be a (finite) collection of hash functions • …that map a given universe U of keys… • …into the range {0, 1, …, m - 1}. • If  is universal if: • for each pair of distinct keys x, y  U,the number of hash functions h  for which h(x) = h(y) is ||/m • In other words: • With a random hash function from , the chance of a collision between x and y (x  y)is exactly 1/m David Luebke 138/9/2014

  14. Review: A Universal Hash Function • Choose table size m to be prime • Decompose key x into r+1 bytes, so that x = {x0, x1, …, xr} • Only requirement is that max value of byte < m • Let a = {a0, a1, …, ar} denote a sequence of r+1 elements chosen randomly from {0, 1, …, m - 1} • Define corresponding hash function ha : • With this definition,  has mr+1 members David Luebke 148/9/2014

  15. Augmenting Data Structures • This course is supposed to be about design and analysis of algorithms • So far, we’ve only looked at one design technique (What is it?) David Luebke 158/9/2014

  16. Augmenting Data Structures • This course is supposed to be about design and analysis of algorithms • So far, we’ve only looked at one design technique: divide and conquer • Next up: augmenting data structures • Or, “One good thief is worth ten good scholars” David Luebke 168/9/2014

  17. Dynamic Order Statistics • We’ve seen algorithms for finding the ith element of an unordered set in O(n) time • Next, a structure to support finding the ith element of a dynamic set in O(lg n) time • What operations do dynamic sets usually support? • What structure works well for these? • How could we use this structure for order statistics? • How might we augment it to support efficient extraction of order statistics? David Luebke 178/9/2014

  18. M8 C5 P2 Q1 A1 F3 D1 H1 Order Statistic Trees • OS Trees augment red-black trees: • Associate a size field with each node in the tree • x->size records the size of subtree rooted at x, including x itself: David Luebke 188/9/2014

  19. M8 C5 P2 Q1 A1 F3 D1 H1 Selection On OS Trees How can we use this property to select the ith element of the set? David Luebke 198/9/2014

  20. OS-Select OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } David Luebke 208/9/2014

  21. M8 C5 P2 Q1 A1 F3 D1 H1 OS-Select Example • Example: show OS-Select(root, 5): OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } David Luebke 218/9/2014

  22. M8 C5 P2 Q1 A1 F3 D1 H1 i = 5r = 6 OS-Select Example • Example: show OS-Select(root, 5): OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } David Luebke 228/9/2014

  23. M8 C5 P2 Q1 A1 F3 D1 H1 i = 5r = 6 OS-Select Example • Example: show OS-Select(root, 5): OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5r = 2 David Luebke 238/9/2014

  24. M8 C5 P2 Q1 A1 F3 D1 H1 i = 5r = 6 OS-Select Example • Example: show OS-Select(root, 5): OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5r = 2 i = 3r = 2 David Luebke 248/9/2014

  25. M8 C5 P2 Q1 A1 F3 D1 H1 i = 5r = 6 OS-Select Example • Example: show OS-Select(root, 5): OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5r = 2 i = 3r = 2 i = 1r = 1 David Luebke 258/9/2014

  26. OS-Select: A Subtlety OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } • What happens at the leaves? • How can we deal elegantly with this? David Luebke 268/9/2014

  27. OS-Select OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } • What will be the running time? David Luebke 278/9/2014

  28. M8 C5 P2 Q1 A1 F3 D1 H1 Determining The Rank Of An Element What is the rank of this element? David Luebke 288/9/2014

  29. M8 C5 P2 Q1 A1 F3 D1 H1 Determining The Rank Of An Element Of this one? Why? David Luebke 298/9/2014

  30. M8 C5 P2 Q1 A1 F3 D1 H1 Determining The Rank Of An Element Of the root? What’s the pattern here? David Luebke 308/9/2014

  31. M8 C5 P2 Q1 A1 F3 D1 H1 Determining The Rank Of An Element What about the rank of this element? David Luebke 318/9/2014

  32. M8 C5 P2 Q1 A1 F3 D1 H1 Determining The Rank Of An Element This one? What’s the pattern here? David Luebke 328/9/2014

  33. OS-Rank OS-Rank(T, x) { r = x->left->size + 1; y = x; while (y != T->root) if (y == y->p->right) r = r + y->p->left->size + 1; y = y->p; return r; } • What will be the running time? David Luebke 338/9/2014

  34. OS-Trees: Maintaining Sizes • So we’ve shown that with subtree sizes, order statistic operations can be done in O(lg n) time • Next step: maintain sizes during Insert() and Delete() operations • How would we adjust the size fields during insertion on a plain binary search tree? David Luebke 348/9/2014

  35. OS-Trees: Maintaining Sizes • So we’ve shown that with subtree sizes, order statistic operations can be done in O(lg n) time • Next step: maintain sizes during Insert() and Delete() operations • How would we adjust the size fields during insertion on a plain binary search tree? • A: increment sizes of nodes traversed during search David Luebke 358/9/2014

  36. OS-Trees: Maintaining Sizes • So we’ve shown that with subtree sizes, order statistic operations can be done in O(lg n) time • Next step: maintain sizes during Insert() and Delete() operations • How would we adjust the size fields during insertion on a plain binary search tree? • A: increment sizes of nodes traversed during search • Why won’t this work on red-black trees? David Luebke 368/9/2014

  37. Maintaining Size Through Rotation • Salient point: rotation invalidates only x and y • Can recalculate their sizes in constant time • Why? y19 x19 rightRotate(y) x11 y12 7 6 leftRotate(x) 6 4 4 7 David Luebke 378/9/2014

  38. Augmenting Data Structures: Methodology • Choose underlying data structure • E.g., red-black trees • Determine additional information to maintain • E.g., subtree sizes • Verify that information can be maintained for operations that modify the structure • E.g., Insert(), Delete() (don’t forget rotations!) • Develop new operations • E.g., OS-Rank(), OS-Select() David Luebke 388/9/2014

  39. The End • Up next: • Interval trees • Review for midterm David Luebke 398/9/2014

More Related