380 likes | 592 Views
WEEK 5 The Disjoint Set Class I. CE222 – Data Structures & Algorithms II Chapter 8.1, 8.2, 8.3, 8.4, 8.5 (based on the book by M. A. Weiss, Data Structures and Algorithm Analysis in C++, 3rd edition, 2006). OUTLINE. Definitions Dynamic Equivalence Problem Operations on Disjoint Sets
E N D
WEEK 5 The Disjoint Set Class I CE222 – Data Structures & Algorithms II Chapter 8.1, 8.2, 8.3, 8.4, 8.5 (based on the book by M. A. Weiss, Data Structures and Algorithm Analysis in C++, 3rd edition, 2006)
OUTLINE • Definitions • Dynamic Equivalence Problem • Operations on Disjoint Sets • Smart Union Algorithms • Path compression Next Week • Worst Case Analysis • Example CE 222-Data Structures & Algorithms II, Izmir University of Economics
DEFINITIONS • A set is a collection of objects. • Set A is a subset of set B if all elements of A are in B. • Subsets are also sets • Union of two sets A and B is a set C which consists of allelements in A and B • Two sets are mutually disjoint if they do not have a common element Disjoint Sets CE 222-Data Structures & Algorithms II, Izmir University of Economics
DEFINITIONS • Partition of a set is a collection of mutually disjoint subsets such that union of all these subsets is the set itself EXAMPLE : S = {1,2,3,4}, A = {1,2}, B = {3,4}, C = {2,3,4}, D = {4} • Is A, B a partition of S? YES • Is A, C partition of S? NO • Is A, D partition of S? NO CE 222-Data Structures & Algorithms II, Izmir University of Economics
DEFINITIONS • A relation R is defined on a set S if for every pair of elements (a,b), a,bS, a R b is either true or false. If a R b is true, then we say that a is related to b. • An equivalence relation is a relation R that satisfy three properties: • (reflexive) a R a, for all a S. • (symmetric) a R b if and only if b R a. • (transitive) a R b and b R c implies that a R c. • An equivalence relation partitions a set into distinct equivalence classes CE 222-Data Structures & Algorithms II, Izmir University of Economics
The Dynamic Equivalence Problem • How can we decide for any a and b if a is related to b? Answer : A two dimensional array of Boolean variables Results in constant time • What happens if the relation is not explicitly defined ? • Relations : a1~a3 , a3~a5, a4~a5 CE 222-Data Structures & Algorithms II, Izmir University of Economics
The Dynamic Equivalence Problem • The equivalence class of an element aS is the subset of S that contains all the elements that are related to a. • To decide if two members are related, only need to check whether the two are in the same equivalence class. • Five element set { a1, a2, a3, a4, a5}, if following relations are given a1~a3 , a3~a5, a5~a4 .. Is a1 related to a4?? CE 222-Data Structures & Algorithms II, Izmir University of Economics
The Dynamic Equivalence Problem • Each equivalence class may be represented by a single object: the representative object • For the relations given : a1~a3 , a3~a5, a5~a4 CE 222-Data Structures & Algorithms II, Izmir University of Economics
Operations on Disjoint Sets Union (Add operation (e.g., add relation a~b)) • Check if a and b are already related: if they are in the same equivalence class. • If not, merge the two equivalence classes containing a and b into a new equivalence class. Find Return the name (pointer or index of representative) of the set containing a given element CE 222-Data Structures & Algorithms II, Izmir University of Economics
A simple Implementation : Example Consider the following disjoint set on the ten decimal digits: CE 222-Data Structures & Algorithms II, Izmir University of Economics
A simple Implementation : Example After UnionSets(1,2) After ( Add relation 1~2 ) CE 222-Data Structures & Algorithms II, Izmir University of Economics
A simple Implementation : Example After UnionSets(2,4) find(4)=>4 and find(2)=>1 (BEFORE union) CE 222-Data Structures & Algorithms II, Izmir University of Economics
A simple Implementation : Example After UnionSets(5,6) find(5)=>5 and find(6)=>6 (BEFORE union) CE 222-Data Structures & Algorithms II, Izmir University of Economics
A simple Implementation : Example After UnionSets(6,7) find(6)=>5 and find(7)=>7 (BEFORE union) CE 222-Data Structures & Algorithms II, Izmir University of Economics
A simple Implementation : Example After UnionSets(2,9) find(2)=>1 and find(9)=>9 (BEFORE union) CE 222-Data Structures & Algorithms II, Izmir University of Economics
A simple Implementation : Example After UnionSets(5,1) find(5)=>5 and find(1)=>1 (BEFORE union) CE 222-Data Structures & Algorithms II, Izmir University of Economics
A simple Implementation : Example After UnionSets(3,0) find(5)=>5 and find(1)=>1 (BEFORE union) CE 222-Data Structures & Algorithms II, Izmir University of Economics
A simple Implementation : Example After UnionSets(0,8) find(0)=>3 and find(8)=>8 (BEFORE union) CE 222-Data Structures & Algorithms II, Izmir University of Economics
A simple Implementation : Example After UnionSets(3,5) find(3)=>3 and find(5)=>5 (BEFORE union) CE 222-Data Structures & Algorithms II, Izmir University of Economics
Tree Implementation For simplicity, we will assume we are creating disjoint sets withN integers We will define an array initialize(int N) {parent = new int[N]; for ( int i = 0; i < N; ++i ) { parent[i] = -1; } } If parent[i] == -1, then i is a root node. Initially, each integer is in its own set CE 222-Data Structures & Algorithms II, Izmir University of Economics
Tree Implementation : FIND and UNION // ITERATIVE int find( int i ) { while( parent[i]!=-1) i = parent[i]; return i;} //RECURSIVE int find( int i ) const { if(parent[i]==-1) return i; else return find(parent[i]); } void UnionSets( int i, int j ) { i = find( i );// root of i j = find( j );// root of j if ( i != j ) parent[j] = i; // 2nd set is appended to 1st set } CE 222-Data Structures & Algorithms II, Izmir University of Economics
Tree Implementation : Time Complexity // ITERATIVE int find( int i ) { while( parent[i]!=-1) i = parent[i]; return i;} // worst case O(N) void UnionSets( int i, int j ) { i = find( i ); j = find( j ); if ( i != j ) parent[j] = i; }// O(1) • If we have “u” Union, “f” Find operations then complexity is O(u + f* N) • M consecutive operationscould take O(MN) time in the worst case CE 222-Data Structures & Algorithms II, Izmir University of Economics
Array Implementation: Example CE 222-Data Structures & Algorithms II, Izmir University of Economics
Array Implementation: Time Complexity int find( int i ) { return array[i];} // O(1) Initialize( int N ) { array = new int [N+1]; for (int e=1; e<=N; e++) array[e] = e; } void UnionSets( int i, int j ) { rooti=find(i); rootj=find(j); for (int k=1; k<=N; k++) if (array[k] == rootj)array[k] = rooti;} //O(N) • If we have “u” Union, “f” Find operations then complexity is O(u*N + f) • M consecutive operationscould take O(MN) time in the worst case CE 222-Data Structures & Algorithms II, Izmir University of Economics
C++ Implementation from Text Book class DisjSets { public: DisjSets(int numElements):s(numElements) {for(int j=0; j<s.size(); j++) s[j]=-1;} int find(int x) const {if(s[x]<0)return x; elsereturn find(s[x]);} void unionSets(int root1,int root2) {s[root2]=root1;} private: vector<int> s;// an array with varying size }; CE 222-Data Structures & Algorithms II, Izmir University of Economics
Linked List Implementation of Disjoint Sets After Union (f,c) CE 222-Data Structures & Algorithms II, Izmir University of Economics
Linked List Implementation of Disjoint Sets • Each set is represented by a linked list • The first object in eachlinked list serves as its set's representative. • Each object in the linked list contains • a set member, • apointer to the object containing the next set member, • apointer back to the representative. • Each list maintainspointers, head, to the representative, and tail, to the last object in the list. • Within each linked list, the objects may appear in any order(subject to our assumption that the first object in each list is the representative). CE 222-Data Structures & Algorithms II, Izmir University of Economics
Smart Union Algorithms • Union by size • Make the smaller tree a subtree of the larger. • If union-by-size, the depth of any node is never more than logN: a find operation is O(logN), and O(MlogN) for a sequence of M. • The worst-case trees are binomial trees • Union by height (Union by rank) CE 222-Data Structures & Algorithms II, Izmir University of Economics
Smart Union Algorithms • Union by height (Union by rank) /*Make the shallow tree a subtree of the deeper*/ /* store at the roots -height-1 */ void DisjSets::unionSets(root1, root2) { if(s[root2]<s[root1]) //root2 is deeper s[root1]=root2; else { //update height if same if(s[root1]==s[root2]) s[root1]--; s[root2]=root1; } } CE 222-Data Structures & Algorithms II, Izmir University of Economics
Smart Union Algorithms : Example CE 222-Data Structures & Algorithms II, Izmir University of Economics
Analysis of Smart Union Algorithms Suppose each list also includes the length of the list and that we always append the smaller list onto the longer (weighted-unionunion by height), with ties broken arbitrarily. Theorem 2.1: Using the linked-list representation of disjoint sets and the weighted-union heuristic, a sequence of m operations take O(m + n lg n) time. CE 222-Data Structures & Algorithms II, Izmir University of Economics
Analysis of Smart Union Algorithms:Theorem 2.1 Proof: • Consider a fixed object x. We know that each time x's representative pointer wasupdated, x must have started in the smaller set. The first time x's representativepointer was updated, therefore, the resulting set must have had at least 2members.Similarly, the next time x's representative pointer was updated, the resulting setmusthave had at least 4 members. Continuing on, we observe that for any k ≤ n,after x'srepresentative pointer has been updated ⌈log k⌉ times, the resulting setmust have atleast k members. • Since the largest set has at most n members, each object'srepresentative pointerhasbeen updated at most ⌈log n⌉ times over all the UNION operations. The total timeusedin updating the n objects is thus O(n log n). • The time for the entire sequence of m operations follows easily. Each MAKE-SET andFIND-SET operation takes O(1) time, and there are O(m) of them. The total time forthe entire sequence is thus O(m + n log n). CE 222-Data Structures & Algorithms II, Izmir University of Economics
Path compression • int Find(int x) • if (parent[x] < 0)return x • else • return parent[x] = Find(parent[x]) • Any single find can still be O(log N), • but later finds on the same path are faster • “u” Unions, “f “ Finds: O(u + f (f, u)) • (f, u) is a functional inverse of Ackermann’s function • N-1 Unions, O(N) Finds: “almost linear” total time CE 222-Data Structures & Algorithms II, Izmir University of Economics
Path compression CE 222-Data Structures & Algorithms II, Izmir University of Economics