410 likes | 518 Views
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis. Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University University of Aarhus. The Union-Find Problem. A universe of N elements: x 1 , x 2 , …, x N Initially N singleton sets: { x 1 }, { x 2 }, …, { x N }
E N D
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University University of Aarhus
The Union-Find Problem • A universe of N elements: x1, x2, …, xN • Initially N singleton sets: {x1}, {x2 }, …, {xN} • Each set has a representative • Maintain the partition under • Union(xi, xj) : Joins the sets containing xi and xj • Find(xi) : Returns the representative of the set containing xi
The Solution representatives d h i p b j a f l z s r c k e g m n Union(d, h) : Find(n) : h h d f l d f l m n b j a b j a m path compression link-by-rank e g e g n
Complexity • O(N α(N)) for a sequence of N union and find operations [Tarjan 75] • α(•) : Inverse Ackermann function (very slow!) • Optimal in the worst case [Tarjan79, Fredman and Saks 89] • Batched (Off-line) version • Entire sequence known in advance • Can be improved to linear on RAM [Gabow and Tarjan 85] • Not possible on a pointer machine [Tarjan79]
Simple and Good, as long as … The entire data structure fits in memory
The I/O Model Main memory of size M One I/O transfers B items between memory and disk Disk of infinite size
Our Results • An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os expected • Same as sorting • optimal in the worst case • A practical algorithm using O(sort(N) log(N/M)) I/Os • Applications to terrain analysis • Topological persistence : O(sort(N)) I/Os • Contour trees : O(sort(N)) I/Os
I/O-Efficient Batched Union-Find • Assumption: No redundant unions • Each union must join two different sets • Will remove later • Two-stage algorithm • Convert to interval union-find • Compute an order on the elements s.t. each union joins two adjacent sets • Solve batched interval union-find
Union Graph (Tree if no redundant unions) 1: Union(d, g) 2: Union(a, c) 3: Union(r, b) 4: Union(a, e) 5: Union(e, i) 6: Union(r, a) 7: Union(a, d) g 8: Union(d, h) r 9: Union(b, f) r r 9 3 6 6 3 f a b a b 4 4 2 9 2 7 7 c d e f c d e 1 8 5 1 5 g h i g i 8 h Equivalent union trees
Transforming the Union Tree r r r 7 3 3 3 6 6 6 8 8 a b a h b d a h b 4 2 9 2 9 9 4 4 7 7 1 2 c d e f c d e f g c e f 1 8 5 1 5 5 i g h i g i r 7 9 6 3 8 d a h b f Weights along root-to-leaf path decrease 1 2 4 5 g c e i
Formulating as a Batched Problem r 3 6 a b r 7 4 9 2 9 6 3 7 8 d a h b f c d e f 1 2 1 8 5 4 5 g c e i g h i For each edge, find the lowest ancestor edgewith a higher weight
Cast in a Geometry Setting r 3 9 6 8 a b 7 4 2 9 7 6 c d e f 5 1 8 5 4 3 g h i 2 1 Euler Tour x: positions in the tour y: weight In O(sort(N)) I/Os [Chiang et al. 95]
Cast in a Geometry Setting r 3 9 6 8 a b 7 4 2 9 7 6 c d e f 5 1 8 5 4 3 g h i 2 1 For each edge, find the lowestancestor edgewith a higher weight For each segment, find the shortest segment above and containing it
Distribution Sweeping M/B vertical slabs checkedrecursively Total cost: O(sort(N)) checked here
In-Order Traversal r 3 9 6 Weights along root-to-leaf path decrease 7 8 b a d h f 1 2 4 5 c e i g • At u, with child u1,…, uk(in increasing order of weight) • Recursively visit subtree at u1 • Return u • For i=2 ,…, kRecursively visit subtree at ui b r c a e i g d h f Claim: this traversalproduces the right order
Solving Interval Union-Find Union: x: two operands y: time stamp Find: x: operand y: time stamp representative
Solving Interval Union-Find Union: x: two operands y: time stamp Find: x: operand y: time stamp Four instances of batched ray shooting: O(sort(N))
Solving Interval Union-Find Union: x: two operands y: time stamp Find: x: operand y: time stamp Four instances of batched ray shooting: O(sort(N))
Handling Redundant Unions • Union tree becomes a general graph • Compute the minimum spanning tree • O(sort(N)) I/Os (randomized) [Chiang et al. 95] O(sort(N) loglog B) I/Os (deterministic) [Arge et al. 04] • Deterministic O(sort(N)) I/Os if graph is planar • Only MST edges are non-redundant
Applications Topological Persistence Contour Trees
Application: Topological Persistence • Introduced by Edelsbrunner et al. 2000 • Measure importance on a surface • Feature extraction • Topological de-noising • Many applications • Surface modeling • Shape analysis • Terrain analysis • Computational Biology
Formulated as Batched Union-Find • Represented as a triangulated mesh • Consider minimum-saddle pairs • When reach • A minimum or maximum: do nothing • A regular point u: Issue union(u,v) for a lower neighbor v • A saddle u: let v and w be nodes from u’s two connected pieces in its lower link Issue: find(v), find(w), union(u,v), union(u,w) lower link
Experiment 1:Random Union-Find 128MB memory
Experiment 2: Topological Persistence on Terrain Data Neuse River Basin of North Carolina: ~ 0.5 billion points
Experiment 2: Topological Persistence on Terrain Data 128MB memory Entire data set (0.5b): IM fails and EM takes 10 hours
Summary • An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os • optimal in the worst case • A practical algorithm using O(sort(N) log(N/M)) I/Os • Applications to terrain analysis • Topological persistence : O(sort(N)) I/Os • Contour trees : O(sort(N)) I/Os • Open Question: • On-line case: Can we get below O(N α(N)) I/Os?
Previous Results • Directly maintain contours • O(N log N) time [van Kreveld et al. 97] • Needs union-split-find for circular lists • Do not extend to higher dimensions • Two sweeps by maintaining components, then merge • O(N log N) time [Carr et al. 03] • Extend to arbitrary dimensions
Join Tree and Split Tree Qualified nodes 9 9 9 9 8 8 8 8 7 7 7 7 6 6 6 6 5 5 5 5 4 4 4 4 3 3 3 3 2 2 1 1 1 1 Join tree Split tree Join tree Split tree
Final Contour Tree Hard to BATCH! 9 9 9 8 8 8 7 7 7 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree
Another Characterization Let w be the highest node that is a descendant of v in join tree and ancestor of u in split tree, (u, w) is a contour tree edge 9 9 9 Now can BATCH! 8 8 8 u 7 7 u 7 u 6 6 6 v u v 5 5 5 w w w 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree
Map to Rectangles 9 9 w 8 8 u 7 7 u u 6 6 v v 5 5 w w 4 4 v 3 3 2 2 1 1 Can be solved in O(sort(N)) I/Os (practical, too) Join tree Split tree
Label Nodes with Intervals 9 8 7 6 5 4 3 2 1 Using Euler tour (O(sort(N) I/Os)
Map to Rectangles 9 9 w 8 8 u 7 7 u u 6 6 v v 5 5 w w 4 4 v 3 3 2 2 1 1 Can be solved in O(sort(N)) I/Os (practical, too) Join tree Split tree
Formulated as Batched Union-Find • Represented as a triangulated mesh • Consider minimum-saddle pairs • When reach • A minimum or maximum: do nothing • A regular poin u: Issue union(u,v) for a lower neighbor v • A saddle u: let v and w be nodes from u’s two connected pieces in its lower link Issue: find(v), find(w), union(u,v), union(u,w) lower link