210 likes | 225 Views
This recitation discusses the implementation of disjoint set structures for set operations such as finding which set an object belongs to and merging two sets. It includes performance analysis and different approaches to improve efficiency.
E N D
Disjoint set structures --for Operations over set(Reference: textbook, pp175-180) CS2223 Recitation 3 March 30, 2005 Song Wang
Problem Description • Given: • A set S with N objects, identified using number 1 to N. • Disjoint partitions (subsets) of the set S. • Any item belongs to one partition • No one item belongs to more than one partitions. • What to do: • Find: given an object, find which set contains it. • Merge: given two set, merge them into one set. • Why: • Basic and frequently used functions for set operations, like union, intersection, and etc. • Consequently, important problem for many other algorithms, like finding the minimum spanning tree.
1 2 7 6 5 4 3 8 9 Set 1 Set 2 Set 3 Preliminaries • Data Structure for Set: Tree • Ex. Parent Node denotes each set Smallest object as the parent node (one choice)
1 3 1 1 1 2 3 2 3 3 2 1 1 0 0 3 0 1 Some adaptation: Index: 1 2 3 4 5 6 7 8 9 Array Preliminaries II • Degraded Linked List: Array to record parent only Index: 1 2 3 4 5 6 7 8 9 Array
Index: 1 2 3 4 5 6 7 8 9 Array 1 2 3 3 2 3 1 1 1 Solution 1: find1() find1(7): 1--belongs to set 1 find1(2): 2—belongs to set 2 Function find1(x) return set[x]
Index: 1 2 3 4 5 6 7 8 9 Index: 1 2 3 4 5 6 7 8 9 Array Array 1 1 1 2 3 3 2 3 1 1 1 1 1 3 3 1 3 1 Solution 1: merge1() Merge set 1 and 2: Procedure merge1(a,b) i<- min (a, b) j<-max (a, b) for k<-1 to N do if set[k]=j then set[k]<-i Scan
Performance Analysis of find1() and merge1() • Case Study: n times of find and <=N-1 times of merge. (n is comparable to N) • Function find1 takes constant time: Θ(1) • Procedure merge1 takes linear time: Θ(N) • Total: n* Θ(1)+(N-1)Θ(N)= Θ(N2) or Θ(n2)
8 1 2 7 6 5 4 3 9 2 8 5 7 1 8 5 9 2 1 9 7 Set 1 Set 2 Set 3 Set 1 Set 1 Can We do Better? Merge set 1 and 2:
Index: 1 2 3 4 5 6 7 8 9 Index: 1 2 3 4 5 6 7 8 9 Array Array 1 1 1 2 3 3 2 3 1 1 1 1 1 3 3 2 3 1 Solution 2: merge2() Merge set 1 and 2: Procedure merge2(a,b) if a<b then set[b]<-a else set[a]<-b Guarantee the root of the tree is the smallest
Index: 1 2 3 4 5 6 7 8 9 Array 1 1 3 3 3 1 1 1 2 8 1 2 7 5 9 Set 1 Solution 2: find2() find1(5): 1 Need traverse the whole path from node 5 to the root node 1 Function find2(x) r<-x while set[r]!=r do r<-set[r] return r Only for root, r=set[r]
Performance Analysis of find2() and merge2() • Case Study: n times of find and <=N-1 times of merge. (n is comparable to N) • Function find2 takes linear time: Θ(N) in the worst case. • Procedure merge2 takes constant time: Θ(1) • Total: n* Θ(N)+(N-1)Θ(1)= Θ(N2) or Θ(n2) • No improvement!
Merge2(5,6) 6 4 4 1 2 3 6 4 5 6 1 5 3 4 2 1 2 1 2 6 5 5 3 3 Merge2(4,5) …… Merge2(1,2) What is the Problem? • The worst case: linear tree Find2(6)? Height of the tree is essential for performance
5 3 4 3 5 7 6 2 1 4 1 7 6 2 1 4 3 5 7 6 2 How to Avoid a Bad Merge Tree Merge(1,4)
Who’s whose subtree? • Tree t1 has height h1 and Tree t2 has height h2 • If h1< h2 : t1 becomes subtree of t2 and merged tree’s height is h2 • If h1== h2 : t1 becomes subtree of t2 and merged tree’s height is h1+1 • The root of the tree is not always the smallest node any more!
Theorem 5.9.1, pp 177 • A tree containing k nodes has a height at most └log k┘ • Proof by induction.
Solution 3: merge3() Procedure merge3(a,b) if height[a]=height[b] then height[a]<-height[a]+1 set[b]<-a else if height[a]>height[b] then set[b]<-a else set[a]<-b
Performance Analysis of find2() and merge3() • Case Study: n times of find and <=N-1 times of merge. (n is comparable to N) • Function find2 takes <linear time: Θ(logN) in the worst case. • Procedure merge3 takes constant time: Θ(1) • Total: n* Θ(logN)+(N-1)Θ(1)= Θ(n log n) • Some improvement
16 20 11 21 20 12 10 21 16 10 11 12 9 4 8 6 1 4 1 6 8 9 Path Compression in find3() • Intuitive explanation • More fan-out of children, less height of the tree. Find3(20)
Solution 3: find3() Function find3(x) r<-x while set[r]!=r do r<-set[r] i<-x while i!=r do j<-set[i] set[i]<-r i<-j return r First traverse of the path Find the root Second traverse of the path Connect nodes on path to root
Performance Analysis of find3() and merge3() • Case Study: n times of find and <=N-1 times of merge. (n is comparable to N) • Function find3 takes little more than constant time. • Procedure merge3 takes constant time: Θ(1) • Total: close to Θ(n) • Best one!
2 5 7 1 9 9 8 5 7 2 1 8 Summery Find1() and merge1(): Best for find, worst for merge (height =1, always ) Find2() and merge2() Best for merge, worst for find (height = N, worst case) Mixing above Mixing above Find2() and merge3() (height = lgN, worst case) Find3() and merge3() (height close to 1) Best for both