1 / 35

Data Structures & Algorithms Union-Find Example

This example demonstrates the use of the Union-Find data structure to efficiently solve the dynamic connectivity problem. It provides step-by-step instructions on how to develop an algorithm, models the problem and its constraints, and implements the necessary operations. The example also highlights the limitations of quadratic algorithms and introduces the Quick-Find and Quick-Union data structures.

merleg
Download Presentation

Data Structures & Algorithms Union-Find Example

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Structures & AlgorithmsUnion-Find Example Richard Newman

  2. Steps to Develop an Algorithm • Define the problem – model it • Determine constraints • Find or create an algorithm to solve it • Evaluate algorithm – speed, space, etc. • If algorithm isn’t satisfactory, why not? • Try to fix algorithm • Iterate until solution found (or give up)

  3. Dynamic Connectivity Problem • Given a set of N elements • Support two operations: • Connect two elements • Given two elements, is there a path between them?

  4. Example Connect (4, 3) Connect (3, 8) Connect (6, 5) Connect (9, 4) Connect (2, 1) Are 0 and 7 connected (No) Are 8 and 9 connected (Yes) 0 1 2 3 4 5 6 7 8 9

  5. Example (con’t) Connect (5, 0) Connect (7, 2) Connect (6, 1) Connect (1, 0) Are 0 and 7 connected (Yes) Now consider a problem with 10,000 elements and 15,000 connections…. 0 1 2 3 4 5 6 7 8 9

  6. Modeling the Elements Various interpretations of the elements: • Pixels in a digital photo • Computers in a network • Socket pins on a PC board • Transistors in a VLSI design • Variable names in a C++ program • Locations on a map • Friends in a social network • … Convenient to just number 0 to N-1 Use as array index, suppress details

  7. Modeling the Connections Assume “is connected to” is an equivalence relation • Reflexive: a is connected to a • Symmetric: if a is connected to b, then b is connected to a • Transitive: if a is connected to b, and b is connected to c, then a is connected to c

  8. Connected Components • A connected component is a maximal set of elements that are mutually connected (i.e., an equivalence set) 0 1 2 3 4 5 6 7 8 9 {0} {1,2} {3,4,8,9} {5,6} {7}

  9. Implementing the Operations Recall – connect two elements, and answer if two elements have a path between them • Find: in which component is element a? • Union: replace components containging elements a and b with their union • Connected: are elements a and b in the same component?

  10. Example 0 1 2 3 4 Union(1,6) 5 6 7 8 9 {0} {1,2} {3,4,8,9} {5,6} {7} Components? 0 1 2 3 4 5 6 7 8 9 {0} {1,2,5,6} {3,4,8,9} {7}

  11. Union-Find Data Type Goal: Design an efficient data structure for union-find • Number of elements can be huge • Number of operations can be huge • Union and find operations can be intermixed public class UF UF int(N); void union(int a, int b); int find(int a); boolean connected(int a, int b); • ;

  12. Dynamic Connectivity Client • Read in number of elements N from stdin • Repeat: • Read in pair of integers from stdin • If not yet connected, connect them and print out pair read input int N while stdin is not empty read in pair of ints a and b if not connected (a, b) union(a, b) print out a and b • ;

  13. Quick-Find • Data Structure • Integer array id[] of length N • Interpretation: id[a] is the id of the component containing a i: 0 1 2 3 4 5 6 7 8 9 id[i]: 0 1 1 4 4 5 5 7 4 4 0 1 2 3 4 5 6 7 8 9

  14. Quick-Find • Data Structure • Integer array id[] of length N • Interpretation: id[a] is the id of the component containing a • Find: what is the id of a? • Connected: do a and b have the same id? • Union: Change all the entries in id that have the same id as a to be the id of b.

  15. Quick-Find i: 0 1 2 3 4 5 6 7 8 9 id: 0 1 1 4 4 5 5 7 4 4 Union(1,6) 0 1 2 3 4 5 6 7 8 9 i: 0 1 2 3 4 5 6 7 8 9 id: 0 5 5 4 4 5 5 7 4 4 It works – so is there a problem? Well, there may be many values to change, and many to search!

  16. Quick-Find • Quick-Find operation times • Initialization takes time O(N) • Union takes time O(N) • Find takes time O(1) • Connected takes time O(1) • Union is too slow – it takes O(N2) array accesses to process N union operations on N elements

  17. Quadratic Algos Do Not Scale! • Rough Standards (for now) • 109 operations per second • 109 words of memory • Touch all words in 1 second (+/- truism since 1950!) • Huge problem for Quick-Find: • 109 union commands on 109 elements • Takes more than 1018 operations • This is 30+ years of computer time!

  18. Quadratic Algos Do Not Scale! • They do not keep pace with technology • New computer may be 10x as fast • But it has 10x as much memory • Want to solve problems 10x as big • With quadratic algorithm, it takes… … 10 x as long!!!

  19. Quick-Union • Data Structure • Integer array id[] of length N • Interpretation: id[a] is the parent of a • Component is root of a = id[id[…id[a]…]] (fixed point) i: 0 1 2 3 4 5 6 7 8 9 id[i]: 0 1 1 3 3 5 5 7 3 4 0 1 3 5 7 8 4 6 2 9

  20. Quick-Union • Find: • Connected: • Union: • Data Structure • What is root of tree of a? • Do a and b have the same root? • Set id of root of b’s tree to be root of a’s tree i: 0 1 2 3 4 5 6 7 8 9 id[i]: 0 1 1 3 3 5 5 7 3 4 0 1 3 5 7 8 4 6 2 9

  21. Quick-Union • Find 9 • Connected 8, 9: • Union 7,5 i: 0 1 2 3 4 5 6 7 8 9 id[i]: 0 1 1 3 3 5 5 7 3 4 5 0 1 3 5 7 8 4 6 2 9 Only ONE value changes! = FAST

  22. Quick-Union • Quick-Union operation times (worst case) • Initialization takes time O(N) • Union takes time O(N) (must find two roots) • Find takes time O(N) • Connected takes time O(N) • Now union AND find are too slow – it takes O(N2) array accesses to process N operations on N elements

  23. Quick-Find/Quick-Union • Observations: • Problem with Quick-Find is unions • May take N array accesses • Trees are flat, but too expensive to keep them flat! • Problem with Quick-Union • Trees may get tall • Find (and hence, connected and union) may take N array accesses

  24. Weighted Quick-Union • Make Quick-Union trees stay short! • Keep track of tree size • Join smaller tree into larger tree • May alternatively do union by height/rank • Need to keep track of “weight” a Quick-Union may do this But we always want this b b a

  25. Weighted Quick-Union • Weighted Quick-Union operation times • Initialization takes time O(N) • Union takes time O(1) (given roots) • Find takes time O(depth of a) • Connected takes time O(max {depth of a, b}) • Proposition: Depth of any node x is at most lg N Pf: What causes depth of x to increase?

  26. Weighted Quick-Union • Proposition: Depth of any node x is at most lg N Pf: What causes depth of x to increase? Only union! And if x is in smaller tree. So x’s tree must at least double in size each time union increases x’s depth Which can happen at most lg N times. (Why?)

  27. WQU with Path Compression • After performing find • Set parent of all nodes along path to root • Time order is the same for the find (just traverse twice) • One-pass Variant • Set every other node’s parent to it’s grandparent • No reason NOT to do this – other than a bit of laziness • Huge benefits – tree is almost flat!

  28. WQU with Path Compression • Theorem: Starting from an empty data structure, any sequence of M union and find operations on N elements take O(N+M lg* N) time • Proof: Difficult! (lg* is number of times you have to take log to get to 1) • Performance: lg* is almost constant! • And, in theory, no linear time algorithm exists!

  29. Lg* Function • Performance: lg* is almost constant!

  30. UF Summary

  31. Example • Huge problem: 1 billion nodes, 10 billion edges • WQU/PC reduces time from 3000 years to 1 minute! • Faster computer won’t help! • WQU/PC on cell phone in Java beats QF on supercomputer! • Better algorithm will!

  32. Applications • Percolation • N by N grid, each space vacant or accupied • Grid percolates if top is connected to bottom by vacant spaces • For large N, vacancy percentage for percolation is about 0.6, known by simulation • Models • Electrical systems • Fluid flow • Social networks

  33. Next – Lecture 3 • Read Chapter 2 • Empirical analysis • Asymptotic analysis of algorithms • Basic recurrences

More Related