External-Memory MST

External-Memory MST (Arge, Brodal, Toma)

Minimum-Spanning Tree • Given a weighted, undirected graph G=(V,E), the minimum-spanning tree (MST) problem is the problem of finding a spanning tree for G of minimum weight. • Assumptions: • G is connected; • No two edges in G have the same weight.

External-Memory Graph Algorithms • Standard two-level I/O model with a single disk: • N = V + E • M= number of vertices/edges that can fit into internal memory. • B= number of vertices/edges per disk block. • The graph is given as a list of edges sorted by vertex.

External-Memory Graph Algorithms (2) • For MST and CC, randomize O(sort(E)) I/Os algorithms are known.

b a b d c e f c a e d      1           5  3      6 6 2     8 4   7  7  7  7 Prim’s Algorithm 7 {b,a} 1 3 {a,c} 5 {c,d} {d,e} 8 9 6 2 {a, f} 4 a b c d e f Priority Queue:

Prim’s Algorithm (2) • Prim’s algorithm cannot be implemented efficiently in external memory: • It is not guaranteed that even the priority queue alone fits in memory. • Thus, we cannot in general get the current vertex priority without using an I/O. • A direct implementation leads to an Ω(E) I/O algorithm.

f c d e b f a c d b a e Prim’s Algorithm (3) Modification: store edges in the priority-queue instead of vertices. 7 {b,a} 1 3 {a,c} 5 {c,d} {d,e} 8 9 6 2 {a, f} 4 {d,e} (4) {b,d} (6) {c,b} (5) {a, f} (7) {b,c} (5) {c,e} (8) {d,b} (6) {a, f} (7) {e,c} (8) {c,e} (8) {e, f} (9) {b,d} (6) {e,c} (8) {d,b} (6) {c,e} (8) {a, f} (7) {e, f} (9) {c,d} (2) {b,d} (6) {c,b} (5) {a, f} (7) {b,c} (5) {c,e} (8) {e,c} (8) {c,e} (8) {e, f} (9) {f, e} (9) {a,c} (3) {b,c} (5) {b,d} (6) {a, f} (7) {c,b} (5) {a, f} (7) {b,c} (5) {e,c} (8) {b,d} (6) {c,e} (8) {d,b} (6) {e, f} (9) {b,a} (1) {b,c} (5) {b,d} (6) Any two edges have distinct weights Priority Queue:

Modified Prim Algorithm • The correctness follows directly from the correctness of the original algorithm (“blue rule” still applies). • Efficiency: • At least one I/O per vertex in order to read its adjacency list => O(V + E/B) I/Os. • O(E) operations on external priority queue can be performed in O(sort(E)). • Thus in total we have O(V + sort(E)) I/Os.

a b d c e f Boruvka’s Algorithm (1) Select for each vertex the minimum weight edge adjacent to it. (2) Contract the graph and return to (1) {b,a} 7 1 3 5 {c,d} {d,e} 8 9 6 2 {a, f} 4

Boruvka’s Algorithm (1) Select for each vertex the minimum weight edge adjacent to it. (2) Contract the graph and return to (1) {b,a} abf {a,c} {c,d} 3,5,6,9 {d,e} {a, f} cde

External-Memory Boruvka’s Step • For each vertex v, let C(v) be the lightest vertex adjacent to it. • Let G’ be the graph obtained by taking only edges of the form (v, C(v)) for each v. • Let G’d be the graph obtained by directing each edge (v, C(v)) in G’ from C(v) to v. • The goal is to contract each connected component in G’ into a single vertex.

Unique Representatives • In each connected component of G’d: • Each vertex has indegree 1. • The weight of the edges along any root-leaf path is increasing. • There is exactly one cycle, consisting of the minimal weight edge.

External-Memory Boruvka’s Step (2) • The roots can be easily identified, and we can choose them to be the unique representatives of the components in G’. • We would like to replace each edge (u, v) with an edge (ur, vr), where ur and vr are the unique representatives of the components containing u and v respectively. • Then, we can remove parallel & self edges, and obtain the contracted graph.

a b d c e f External-Memory Boruvka’s Step (3) L: Output: (b,a) (1); (a, f) (7) (c,d) (2); (d,e) (4) (d,e) (4) (a, f) (7) G G’ G’d b → b c → c a → b d → c f → b e → c 1 7 3 5 8 9 Priority Queue: 6 2 a (1) [b] d (2) [c] d (2) [c] f (7) [b] e (4) [c] f (7) [b] 4 Initialized with each vertex that is an immediate successor of a root vertex.

External-Memory Boruvka’s Step (4) To finish the contraction: • sort the output of the previous phase and E by the first component. Then scan the two lists simultaneously, replacing each edge (v, u) in E with (vr,u). • sort the output and E by the second component, and then scan the two lists replacing each edge (vr, u) in E with (vr, ur). • sort E by both components and by weight, and with a single scan remove duplicate & self edges.

Boruvka’s Step - I/O efficiency • Lightest incident edges can be collected in O(E/B) I/Os in a simple scan of the edge-list representation of G (we assume it is sorted). • Detection of cycles in G’d can be done in O(sort(V)) I/Os: • sort the collected edges by weight and find duplicates in a single scan. • remove edges to break cycles and identify unique representatives.

Boruvka’s Step - I/O efficiency (2) • The list L contains each edge in G’d at most twice, and can be constructed in O(sort(V)) I/Os: • sort one instance of the list of edges by the second component. • sort another instance by the first component. • create the structure of L in a single scan and sort it by weight. 4. The PQ can be initialized in a similar way in O(sort(V)) I/Os.

Boruvka’s Step - I/O efficiency (3) 5. We perform a total of V insertions to PQ, and V extract-min operations. That can be performed in O(sort(V)) I/Os. 6. Replacing the edges of G with the unique representatives is done using a few sorting and scanning operations as described before. Here the entire edge list is sorted, and thus O(sort(E)) I/Os are needed. Total: O(E/B + sort(V) + sort(E)) = O(sort(E)) I/Os.

Results So Far Modified Prim O(V + sort(E)) I/Os Modified Boruvka O(sort(E) · lgV) I/Os • Contract G until V ≤ E/B using Boruvka’s steps. • Run Prim on the result. O(sort(E)·lg(V·B/E)) I/Os It is possible to perform lg(V·B/E) Boruvka’s steps using lglg(V·B/E) superphases requiring O(sort(E)) I/Os each.

Yet a better MST algorithm Superphase Algorithm At superphase i : • Let Ni = 2(3/2)i (Ni+1= Ni·(Ni)1/2) • Let Gi= (Vi, Ei) be the graph prior to superphase i. • Let Ei‘  Ei be the set that for each vertex contains the √Ni lightest edges incident to it. • Let the blocking value for a vertex be the weight of the √Ni + 1th lightest edge incident to it (or infinity if no such edge exists). • Ei‘ and blocking values can be found with O(sort(Ei)) I/Os as described earlier.

Superphase Algorithm • At superphase i, perform on Gi‘ log√Ni contraction phases as described before, but now select the lightest edge incident to a vertex only if it is smaller than its blocking value. • After a single contraction, the blocking value of a supervertex is set to be the minimum of the blocking values of the contracted vertices. • After that, the remaining edges of Ei‘ contain all edges of Ei adjacent to supervertex v with weight smaller than the blocking value of v. • Thus only edges that actually belong to the MST are contracted.

Superphase Algorithm (2) But how many vertices remain after each superphase? • The blocking value might prevents us from selecting an edge for v. But if so than: • The blocking value of v corresponds to the blocking value of some vertex u in Vi, and v must contain the √Ni edges adjacent to u in Ei‘. • Thus v must be the contraction of at least √Ni vertices from Vi • If no blocking value prevents us from selecting an edge for v, then after log√Ni phases, v must be the contraction of at least 2log√Ni= √Nivertices.

Superphase Algorithm (3) • It can be proved by induction on i that Vi ≤ 2V / Ni : • For i = 0, Ni = 2 and V0 = V. • Vi+1 ≤ Vi / √Ni ≤ (2V / Ni) / √Ni = 2V / Ni+1 • Conclusion: Ei‘ ≤ Vi√Ni ≤ 2V / √Ni • Thus, in order to reduce the number of vertices by a factor of √Ni we used so far: O(sort(Ei) + sort(Ei‘) · log√Ni) = O(sort(E) + sort(V / √Ni) · log√Ni) = O(sort(E)) I/Os.

Superphase Algorithm (4) • In order to finish a superphase, we need to reincorporate edges from Ei not selected to Ei‘: • During the contraction phases, maintain a list C of the form (v, vs) for v Vi. • Use the output of the Boruvka’s step, as described earlier, in order to update C: • Sort C by second component and the output by first component and scan them simultaneously. • This is done using O(sort(Vi)) I/Os. • In total, in order to maintain C, we use: O(sort(Vi)·log√Ni) = O(sort(V / Ni)·log√Ni) = O(sort(V)) I/Os.

Superphase Algorithm – I/O Efficiency • Ei‘ and blocking values are computed in O(sort(Ei)) I/Os. • Each superphase takes up O(sort(E)) I/Os. • Maintaining the list C during the superphase is done with O(sort(V)) I/Os. • Given C, the edges in (Ei \ Ei‘) can be reincorporated in O(sort(E)) as we did in the single contraction algorithm. • Finally, in order to reduce V to E/B, log3/2lg(V·B / E) superphases are needed. • Total: O(sort(E)·lglg(V·B / E)) I/Os.

External-Memory MST

External-Memory MST

Presentation Transcript

External Memory Interfaces

Chapter 6 External Memory

Interfacing External Memory

Chapter 6 – External Memory

Group #6 External Memory

CHAPTER 6 EXTERNAL MEMORY

External Memory Value Iteration

External Memory Geometric Data Structures

External-Memory Sorting

External Memory Hashing

External Memory Data Structures

CHAPTER 6 EXTERNAL MEMORY

External Memory

External Memory Hashing

External Memory

External Memory (2)

Chapter 6 External Memory

External Memory