1 / 45

Today’s Material

Today’s Material. The dynamic equivalence problem a.k.a. Disjoint Sets/Union-Find ADT Covered in Chapter 8 of the textbook. Motivation. Consider the relation “ = ” between integers For any integer A, A = A (reflexive) For integers A and B, A = B means that B = A (symmetric)

Download Presentation

Today’s Material

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Today’s Material • The dynamic equivalence problem • a.k.a. Disjoint Sets/Union-Find ADT • Covered in Chapter 8 of the textbook

  2. Motivation • Consider the relation “=” between integers • For any integer A, A = A (reflexive) • For integers A and B, A = B means that B = A (symmetric) • For integers A, B, and C, A = B and B = C means that A = C (transitive) • Consider cities connected by two-way roads • A is trivially connected to itself • A is connected to B means B is connected to A • If A is connected to B and B is connected to C, then A is connected to C

  3. Equivalence Relationships • An equivalence relation R obeys three properties: • reflexive: for any x, xRx is true • symmetric: for any x and y, xRy implies yRx • transitive: for any x, y, and z, xRy and yRz implies xRz • Preceding relations are all examples of equivalence relations • What are notequivalence relations?

  4. Equivalence Relationships • An equivalence relation R obeys three properties: • reflexive: for any x, xRx is true • symmetric: for any x and y, xRy implies yRx • transitive: for any x, y, and z, xRy and yRz implies xRz • What about “<” on integers? • 1 and 2 are violated • What about “≤” on integers? • 2 is violated

  5. Equivalence Classes and Disjoint Sets • Any equivalence relation R divides all the elements into disjoint sets of “equivalent” items • Let ~ be an equivalence relation. If A~B, then A and B are in the same equivalence class. • Examples: • On a computer chip, if ~ denotes “electrically connected,” then sets of connected components form equivalence classes • On a map, cites that have two-way roads between them form equivalence classes • What are the equivalence classes for the relation “Modulo N” applied to all integers?

  6. Equivalence Classes and Disjoint Sets • Let ~ be an equivalence relation. If A~B, then A and B are in the same equivalence class. • Examples: • The relation “Modulo N” divides all integers in N equivalence classes (for the remainders 0, 1, …, N-1) • Under Mod 5: • 0 ~ 5 ~ 10 ~ 15 … • 1 ~ 6 ~ 11 ~ 16 … • 2 ~ 7 ~ 12 ~ … • 3 ~ 8 ~ 13 ~ … • 4 ~ 9 ~ 14 ~ … • (5 equivalence classes denoting remainders 0 through 4 when divided by 5)

  7. Union and Find: Problem Definition • Given a set of elements and some equivalence relation ~ between them, we want to figure out the equivalence classes • Given an element, we want to find the equivalence class it belongs to • E.g. Under mod 5, 13 belongs to the equivalence class of 3 • E.g. For the map example, want to find the equivalence class of Eskisehir (all the cities it is connected to) • Given a new element, we want to add it to an equivalence class (union) • E.g. Under mod 5, since 18 ~ 13, perform a union of 18 with the equivalence class of 13 • E.g. For the map example, Ankara is connected to Eskisehir, so add Ankara to equivalence class of Eskisehir

  8. Disjoint Set ADT • Stores N unique elements • Two operations: • Find: Given an element, return the name of its equivalence class • Union: Given the names of two equivalence classes, mergethem into one class (which may have a new name or one of the two old names) • ADT divides elements into E equivalence classes, 1 ≤ E ≤ N • Names of classes are arbitrary • E.g. 1 through N, as long as Find returns the same name for 2 elements in the same equivalence class

  9. Disjoint Set ADT Properties • Disjoint set equivalence property: every element of a DS ADT belongs to exactly one set (its equivalence class) • Dynamic equivalence property: the set of an element can change after execution of a union Disjoint Set ADT • Example: • Initial Classes = {1,4,8}, {2,3}, {6}, {7}, {5,9,10} • Name of equiv. class underlined {1,4,8} {6} {7} {5,9,10} {2,3} {6} Find(4) 8 {2,3,6} {2,3} Union(6, 2)

  10. Disjoint Set ADT: Formal Definition • Given a set U = {a1, a2, … , an} • Maintain a partition of U, a set of subsets (or equivalence classes) of U denoted by {S1, S2, …, Sk} such that: • each pair of subsets Si and Sj are disjoint • together, the subsets cover U • each subset has a unique name • Union(a, b) creates a new subset which is the union of a’s subset and b’s subset • Find(a) returns the unique name for a’s subset

  11. Implementation Ideas and Tradeoffs • How about an array implementation? • N element array A: A[i] holds the class name for element i • E.g. Assume 8~ 4~3 • pick 3 as class name and set A[8] = A[4] = A[3] = 3 1 2 5 0 3 4 6 7 8 9 0 1 1 3 3 3 1 6 6 A 1 Sets: {0}, {1, 2, 5, 9}, {3, 4, 8}, {6, 7} • Running time for Find(i)? O(1) (just return A[i]) O(N) • Running time for Union(i, j)?

  12. Implementation Ideas and Tradeoffs • How about linked lists? • One linked list for each equivalence class • Class name = head of list E.g.: Sets: {0}, {1, 2, 5, 9}, {3, 4, 8}, {6, 7} • Running time for Union(i, j) ? • E.g. Union(1, 3) • O(1) – Simply append one list to the end of the other 0 1 2 5 9 3 4 8 • Running time for Find(i) = ? • O(N) – Must scan all lists in the worst case 6 7

  13. Implementation Ideas and Tradeoffs • Tradeoff between Union-Find – can we do both in O(1) time? • N-1 Unions (the maximum possible) and MFinds • O(N2 + M) for array • O(N + MN) for linked list implementation • Can we do this in O(M + N) time?

  14. Towards a new Data Structure • Intuition: Finding the representative member (= class name) for an element is like the opposite of searching for a key in a given set • So, instead of trees with pointers from each node to its children, let’s use trees with a pointer from each node to its parent • Such trees are known as Up-Trees

  15. Up-Tree Data Structure • Each equivalence class (or discrete set) is an up-tree with its root as its representative member • All members of a given set are nodes in that set’s uptree d b g f a c h e NULL NULL NULL {c, f} {h} {a, d, g, b, e} Up-Trees are not necessarily binary

  16. Implementing Up-Trees • Forest of up-trees can easily be stored in an array (call it “up”) • up[X] = parent of X; • = -1 if root NULL NULL NULL NULL b f c a d e g i h {g} {h, i} {c, f} {a, b, d, e} 4(e) 7(h) 3(d) 8(i) 0(a) 1(b) 2(c) 6(g) 5(f) -1 -1 -1 -1 1 0 2 0 7 Array up:

  17. Example Find • Find(x): Just follow parent pointers to the root • Find(e) = a • Find(f) = c • Find(g) = g NULL NULL NULL NULL b c f h i g a d e {g} {h, i} {c, f} {a, b, d, e} 4(e) 7(h) 3(d) 8(i) 0(a) 1(b) 2(c) 6(g) 5(f) -1 -1 -1 -1 1 0 2 0 7 Array up: Find(e)

  18. Implementing Find(x) #define N 9 intup[N]; /* Returns setid of “x”*/ int Find(int x){ while (up[x] >= 0){ x = up[x]; } /* end-while */ return x; } /* end-Find */ NULL NULL NULL NULL b e i d a h c f g {g} {h, i} {c, f} {a, b, d, e} Running time? O(maxHeight) 4(e) 7(h) 3(d) 5(f) 8(i) 6(g) 0(a) 2(c) 1(b) -1 -1 -1 -1 1 2 0 0 7 Array up: Find(4)

  19. Recursive Find(x) #define N 9 intup[N]; /* Returns setid of “x”*/ int Find(int x){ if (up[x] < 0) return x; return Find(up[x]); } /* end-Find */ NULL NULL NULL NULL b e i h g f a d c {g} {h, i} {c, f} {a, b, d, e} 4(e) 7(h) 3(d) 8(i) 0(a) 1(b) 2(c) 6(g) 5(f) -1 -1 -1 -1 1 0 2 0 7 Array up: Find(4)

  20. Example Union • Union(x, y): Just hang one root from the other! • Union(c, a) NULL NULL NULL NULL b f c a d e g h i {g} {h, i} {a, b, d, e, c, f} 4(e) 7(h) 3(d) 8(i) 1(b) 2(c) 6(g) 5(f) 0(a) -1 -1 -1 1 0 2 0 7 Array up: 2 -1

  21. Implementing Union(x, y) #define N 9 intup[N]; /* Joins two sets */ int Union(int x, int y){ assert(up[x] < 0); assert(up[y] < 0); up[y] = x; } /* end-Union */ NULL NULL NULL NULL b a d g e c f h i {g} {h, i} {a, b, d, e, c, f} Running time? O(1) 7(h) 8(i) 6(g) 5(f) 0(a) 1(b) 2(c) 4(e) 3(d) 2 -1 -1 -1 0 2 1 7 0 Array up: 21

  22. MakeSet(): Creating initial sets NULL NULL NULL NULL NULL NULL NULL NULL NULL f d a b c g h i e {a} {g} {h} {i} {b} {e} {f} {c} {d} #define N 9 intup[N]; /* Make initial sets */ void MakeSets(){ inti; for (i=0; i<N; i++){ up[i] = -1; } /* end-for */ } /* end-MakeSets */

  23. Detailed Example Initial Sets d c f i g h e b b e a g h e c b f d i a {a} {g} {h} {i} {b} {e} {f} {c} {d} Union(b, e) {a} {c} {g} {h} {i} {d} {f} {b, e}

  24. Detailed Example a d i g h d c f i g h e e b b a f c {a} {c} {g} {h} {i} {d} {f} {b, e} Union(a, d) {c} {g} {h} {i} {f} {b, e} {a, d}

  25. Detailed Example a b e i c f g h {c} {g} {h} {i} {f} d {b, e} {a, d} Union(a, b) a g h i c f {c} {g} {h} {i} {f} d b e {a, d, b, e}

  26. Detailed Example a g h i c f {c} {g} {h} {i} {f} d b e {a, d, b, e} Union(h, i) a g h c f {c} {g} {f} d b i e {a, d, b, e} {h, i}

  27. Detailed Example a g h c f {c} {g} {f} d b i e {a, d, b, e} {h, i} Union(c, f) a g h c {g} d b i f {c, f} e {h, i} {a, d, b, e}

  28. Detailed Example a g h c {g} d b i f {c, f} e {h, i} g h c {a, d, b, e} {g} Union(c, a) i f a {h, i} d Q: Can we do a better job on this union for fasterfinds in the future? b e {a, d, b, e, c, f}

  29. Implementation of Find & Union #define N 9 intup[N]; /* Returns setid of “x”*/ int Find(int x){ if (up[x] < 0) return x; return Find(up[x]); } /* end-Find */ #define N 9 intup[N]; /* Joins two sets */ int Union(int x, int y){ assert(up[x] < 0); assett(up[y] < 0); up[y] = x; } /* end-Union */ Running time: O(1) Running time: O(MaxHeight) Height depends on previous unions Best Case: 1-2, 1-4, 1-5, … - O(1) Worst Case: 2-1, 3-2, 4-3, … - O(N) Q: Can we do a better?

  30. Let’s look back at our example a g h c {g} d b i f {c, f} e {h, i} g h c {a, d, b, e} {g} Union(c, a) i f a {h, i} • Q: Can we do a better job on this union for fasterfinds in the future? • How can we make the new tree shallow? d b e {a, d, b, e, c, f}

  31. Speeding up Find: Union-by-Size • Idea: In Union, always make the root of the larger tree the newroot – union-by-size a g h g h c {g} c b i {g} d i f {h, i} a e g h c f a {h, i} {a, d, b, e, c, f} {g} d d b i f b After Union(c, a) with Union-by-size Initial Sets After Union(c, a) {c, f} e {h, i} e {a, d, b, e} {a, d, b, e, c, f}

  32. Trick for Storing Size Information • Instead of storing -1 in root, store up-tree size as negative value in root node b c f a d e g h i {g} {h, i} {c, f} {a, b, d, e} 4(e) 7(h) 3(d) 6(g) 5(f) 0(a) 1(b) 8(i) 2(c) -2 -4 -2 -1 1 2 0 0 7 Array up:

  33. Implementing Union-by-Size #define N 9 intup[N]; /* Joins two sets. Assumes x & y are roots */ int Union(int x, int y){ assert(up[x] < 0); assert(up[y] < 0); if (up[x] < up[y]){ // x is bigger. Join y to x up[x] += up[y]; up[y] = x; } else { // y is bigger. Join x to y up[y] += up[x]; up[x] = y; } /* end-else */ } /* end-Union */ Running time? O(1) 33

  34. Running Time for Find with Union-by-Size • Finds are O(MaxHeight) for a forest of up-trees containing N nodes • Theorem: Number of nodes in an up-tree of height h using union-by-size is ≥ 2h • Pick up-tree with MaxHeight • Then, 2MaxHeight ≤ N • MaxHeight ≤ log N • Find takes O(log N) • Proof by Induction • Base case: h = 0, tree has 20 = 1 node • Induction hypothesis: Assume true for h < h′ • Induction Step: New tree of height h′ was formed via union of two trees of height h′-1 . • Each tree then has ≥ 2h’-1nodes by the induction hypothesis • So, total nodes ≥ 2h’-1 + 2h’-1 = 2h’ • Therefore, True for all h

  35. Union-by-Height • Textbook describes alternative strategy of Union-by-height • Keep track of height of each up-tree in the root nodes • Union makes root of up-tree with greater height the new root • Same results and similar implementation as Union-by-Size • Find is O(log N) and Union is O(1)

  36. Can we make Find go faster? • Can we make Find(g)do something so that future Find(g) calls will run faster? • Right now, M Find(g) calls run in total O(M*logN) time • Can we reduce this to O(M)? b e g c f a d h i {h, i} {c, f} {a, b, d, e, g} • Idea: Make Find have side-effects so that future Finds will run faster.

  37. Introducing Path Compression • Path Compression: Point everything along path of a Find to root • Reduces height of entire access path to 1 • Finds get faster! b b f i h c d e g e g c f i d a h a Find(g) {h, i} {c, f} {h, i} {c, f} {a, b, d, e, g} {a, b, d, e, g}

  38. Another Path Compression Example b a g e c f i d h c a f g h b e Find(g) {c, f} {c, f} i d {a, b, d, h, e, i, g} {a, b, d, h, e, i, g}

  39. Implementing Path Compression • Path Compression: Point everything along path of a Find to root • Reduces height of entire access path to 1 • Finds get faster! Running time: O(MaxHeight) #define N … intup[N]; /* Returns setid of “x” */ int Find(int x){ if (up[x] < 0) return x; int root = Find(up[x]); up[x] = root; /* Point to the root */ return root; } /* end-Find */ • But, what happens to the tree height over time? • It gets smaller • What’s the total running time if we do M Finds? • Turns out this is equal to O(M*InvAccerman(M, N))

  40. Running time of Find with Path Compression • What’s the total running time if we do MFinds? • Turns out this is equal to O(M*InvAccerman(M, N)) • InverseAccerman(M, N) <= 4 for all practical values of M and N • So, total running time of M Finds <= 4*M=O(M) • Meaning that the amortized running time of Findwith path compression is O(1)

  41. Summary of Disjoint Set ADT • The Disjoint Set ADT allows us to represent objects that fall into different equivalence classes or sets • Two main operations: Union of two classes and Find class name for a given element • Up-Tree data structure allows efficient array implementation • Unions take O(1) worst case time, Finds can take O(N) • Union-by-Size (or by-Height) reduces worst case time for Find to O(log N) • If we use both Union-by-Size/Height & Path Compression • Any sequence of M Union/Find operations results in O(1) amortized time per operation (for all practical purposes)

  42. Applications of Disjoint Set ADT • Disjoint sets can be used to represent: • Cities on a map (disjoint sets of connected cities) • Electrical components on chip • Computers connected in a network • Groups of people related to each other by blood • Textbook example: Maze generation using Unions/Finds: • Start with walls everywhere and each cell in a set by itself • Knock down walls randomly and Union cells that become connected • Use Find to find out if two cells are already connected • Terminate when starting and ending cell are in same set i.e. connected (or when all cells are in same set)

  43. Disjoint Set ADT Declaration & Operations classDisjointSet{ private: int *up; // Up links array int N; // Number of sets public: DisjointSet(int n); // Creates N sets ~DisjointSet(){delete up;} intFind(int x); void Union(int x, int y); };

  44. Operations: DisjointSet, Find /* Create N sets */ DisjointSet::DisjointSet(int n){ inti; N = n; up = new int[N]; for (i=0; i<N; i++) up[i] = -1; } //end-DisjointSet /* Returns setid of “x” */ intDisjointSet::Find(int x){ if (up[x] < 0) return x; int root = Find(up[x]); up[x] = root; /* Point to the root */ return root; } /* end-Find */

  45. Operations: Union (by size) /* Joins two sets. Assumes x & y are roots */ intDisjointSet::Union(int x, int y){ assert(up[x] < 0); assert(up[y] < 0); if (up[x] < up[y]){ // x is bigger. Join y to x up[x] += up[y]; up[y] = x; } else { // y is bigger. Join x to y up[y] += up[x]; up[x] = y; } /* end-else */ } /* end-Union */

More Related