400 likes | 416 Views
Learn about fundamental graph problems, techniques, and data structures with a focus on I/O-efficient algorithms for massive data sets. Topics covered include list ranking, tree algorithms, connectivity, and more.
E N D
I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002
Motivation For theoreticians: • Graph problems are neat, often difficult, hence interesting For practitioners: • Massive graphs arise in GIS, web modelling, ... • Problems in computational geometry can be expressed as graph problems • Many abstract problems best viewed as graph problems • Extreme: Pointer-based data structures = graphs with extra information at their nodes
Outline Fundamental graph problems • List ranking • Algorithms for trees • Euler tour • Tree labelling • Graph searching • BFS/DFS • Connectivity • Connected components • Minimum spanning tree • Single source shortest paths
Outline • Techniques and data structures • Graph contraction • Time-forward processing • Tournament tree • Buffered repository tree • Lower bounds • List ranking • Connectivity • Planar graphs
Introduction and “Simple” Problems List ranking Euler tour Tree labelling Evaluating directed acyclic graphs Greedy graph algorithms
3 1 5 2 3 1 1 2 3 4 5 6 3 4 9 11 14 15 List Ranking
1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 Why Is List Ranking Non-Trivial? • The internal memory algorithm spends W(N) I/Os in the worst case.
3 1 5 2 3 1 3 1 5 2 3 1 3 1 7 4 3 4 11 15 3 4 9 11 14 15 An Efficient List Ranking Algorithm • Assume an independent set of size at least N/3 can be found efficiently (in O(sort(N)) I/Os).
An Efficient List Ranking Algorithm • Compressing L: • Sort elements in L \ I • Sort elements in I by their successor pointers • Scan the two lists to update the label of succ(v), for every element v I • The I/O-complexity of this procedure is Theorem:A list of size N can be ranked in O(sort(N)) I/Os.
r The Euler Tour Technique Goal: Given a tree T, represent it by a list L so that certain computations on T can be performed by ranking L.
The Euler Tour Technique Theorem:Given the adjacency lists of the vertices in T, an Euler tour can be constructed in O(scan(N)) I/Os. • Let {v,w1},…,{v,wr} be the edgesincident to v • Then succ((wi,v)) = (v,wi+1)) w4 v w3 w1 w2
Rooting a Tree • Choosing a vertex r as the root of a tree T defines parent-child relationships between adjacent nodes • Rooting tree T =computing for every edge{v,w} who is the parentand who is the child • v = p(w) if and only ifrank((v,w)) < rank((w,v)) Theorem:A tree can be rooted in O(sort(N)) I/Os.
0 1 1 0 9 1 9 0 8 1 9 1 2 0 8 1 5 0 5 1 6 6 0 4 2 5 1 3 0 8 1 7 0 4 1 8 1 4 0 7 0 3 3 4 7 8 Computing a Preorder Numbering Theorem:A preorder numbering of a rooted tree T can be computed in O(sort(N)) I/Os. preorder#(v) = rank((p(v),v))
10 1 18 17 16 8 1 2 15 8 9 3 10 7 3 1 3 14 11 6 13 5 4 12 1 1 1 1 Computing Subtree Sizes Theorem:The nodes of T can be labelled with their subtree sizes in O(sort(N)) I/Os.
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 Evaluating a Directed Acyclic Graph • More general: Given a labelling f, compute a labelling y so that y(v) is computed from f(v) and y(u1),…,y(ur), where u1,…,ur are v’s in-neighbors 0 0 0 1 0
Assume nodes are given in topologically sorted order. 0 0 0 1 6 0 12 10 1 1 1 1 1 2 5 7 1 1 9 11 0 0 0 0 8 4 0 3 Q: Time-Forward Processing • Use priority queue Q to send data along the edges. 0 0 0 0 0 0 1 1 1 1 1 0 0 0 (6,1,0) (6,5,1) (7,4,0) (7,5,1) (8,4,0) (8,5,1) (6,1,0) (6,5,1) (7,4,0) (7,5,1) (8,4,0) (8,5,1) (6,1,0) (7,4,0) (8,4,0) (5,2,1) (5,3,0) (6,1,0) (7,4,0) (8,4,0) (5,2,1) (5,3,0) (6,1,0) (7,4,0) (8,4,0) (4,2,1) (5,2,1) (6,1,0) (4,2,1) (4,3,0) (5,2,1) (5,3,0) (6,1,0) (4,2,1) (4,3,0) (5,2,1) (5,3,0) (6,1,0) (7,4,0) (7,5,1) (8,4,0) (8,5,1) (6,1,0) (5,2,1) (5,3,0) (6,1,0) (7,4,0) (7,5,1) (8,4,0) (8,5,1) (10,6,0) (8,4,0) (8,5,1) (9,7,1) (10,6,0) (10,7,1) (8,4,0) (8,5,1) (10,6,0) (8,4,0) (8,5,1) (9,7,1) (10,6,0) (10,7,1) (11,9,1) (11,10,0) (12,9,1) (12,10,0) (9,7,1) (10,6,0) (10,7,1) (9,7,1) (9,8,0) (10,6,0) (10,7,1) (9,7,1) (9,8,0) (10,6,0) (10,7,1) (10,6,0) (10,7,1) (10,6,0) (10,7,1) (11,9,1) (12,9,1) (10,6,0) (10,7,1) (11,9,1) (12,9,1) (11,9,1) (12,9,1) (11,9,1) (11,10,0) (12,9,1) (12,10,0) (7,4,0) (7,5,1) (8,4,0) (8,5,1) (10,6,0) (12,9,1) (12,10,0)
Time-Forward Processing Analysis: • Vertex set + adjacency lists scanned • O(scan(|V| + |E|)) I/Os • Priority queue: • Every edge inserted into and deleted from Q exactly once • O(|E|) priority queue operations • O(sort(|E|)) I/Os
Time-Forward Processing Analysis: • Vertex set + adjacency lists scanned • O(scan(|V| + |E|)) I/Os • Priority queue: • Every edge inserted into and deleted from Q exactly once • O(|E|) priority queue operations • O(sort(|E|)) I/Os Theorem:A directed acyclic graph G = (V,E) can be evaluated in O(sort(|V| + |E|)) I/Os.
Maximal Independent Set (MIS) Algorithm GREEDYMIS: 1. I 0 2. for every vertex v G do 3. if no neighbor of v is in I then 4. Add v to I 5. end if 6. end for
Maximal Independent Set (MIS) Algorithm GREEDYMIS: 1. I 0 2. for every vertex v G do 3. if no neighbor of v is in I then 4. Add v to I 5. end if 6. end for Observation:It suffices to consider all neighbors of v which have been visited in a previous iteration.
3 3 4 4 11 2 2 6 6 7 1 1 10 5 5 8 9 Maximal Independent Set (MIS)
3 3 3 4 4 4 11 11 2 2 2 6 6 6 7 7 7 1 1 10 10 10 5 5 5 8 8 8 9 9 9 Maximal Independent Set (MIS) Theorem:A maximal independent set of a graphG = (V,E) can be computed in O(sort(|V|+|E|)) I/Os. 11 1
Large Independent Set of a List Corollary:An independent set of size at least N/3 for a list L of size N can be found in O(sort(N)) I/Os. • Every vertex in an MIS I prevents two other vertices from being in I: • Every MIS has size at least N/3.
Graph Connectivity Connected components Minimum spanning tree
ConnectivityA Semi-External Algorithm Analysis: • Scan vertex set to load vertices into main memory • Scan edge set to carry out algorithm • O(scan(|V| + |E|)) I/Os Theorem:The connected components of a graph can be computed in O(scan(|V| + |E|)) I/Os, provided that |V| M.
ConnectivityThe General Case Idea: • If |V| M • Use semi-external algorithm • If |V| > M • Identify simple connected subgraphs of G • Contract these subgraphs to obtain graphG’ = (V’,E’) with |V’| c|V|, c < 1 • Recursively compute connected components of G’ • Obtain labelling of connected components of G from labelling of components of G’
2 1 2 2 1 D 1 B 2 i C a h 2 g 2 e 1 d j 1 E 2 n b A 1 m l 2 2 f c D k 1 B C 2 E 1 A 2 ConnectivityThe General Case
ConnectivityThe General Case Main steps: • Find smallest neighbors (easy) • Compute connected components of graph H induced by selected edges • Contract each component into a single vertex (easy) • Call the procedure recursively • Copy label of every vertex v G’ to all vertices in G represented by v (easy)
Every connected component of H has size at least 2 • |V’| |V|/2 • recursive calls Theorem:The connected components of a graph G = (V,E) can be computed in I/Os. ConnectivityThe General Case
ConnectivityThe General Case • Later: BFS in O(|V| + sort(|E|)) I/Os • Can be used to identify connected components • When |V| = |E|/B, algorithm takes O(sort(|E|)) I/Os • Can stop recursion after recursive calls Theorem:The connected components of a graph G = (V,E) can be computed in I/Os.
i a h g e d j n b m l f c D k B C E A Minimum Spanning Tree (MST) Observation:Connectivity algorithm can be augmented to produce a spanning tree of G.
Minimum Spanning Tree (MST) To obtain a minimum spanning tree: • Choose edge of minimum weight incident to v • Some book-keeping: • The weight of an edge e in the compressed graph = the min weight of all edges represented by e • When “e is added” to T, add in fact this minimum edge a 1 d v 5 4 3 b c
Theorem:A MST of a graph G = (V,E) can be computed in I/Os. Minimum Spanning Tree (MST) i a h g e d j n b m l f c k D B C E A
A Fast MST Algorithm • Idea: • Assume MST can be computed inO(|V| + sort(|E|)) I/Os • Again recursion can be stopped afteriterations • Prim’s algorithm:
A Fast MST Algorithm • Maintain superset of blue edges in priority queue Q • When edge {v,w} of minimum weight is retrieved, test whether v,w are both in T • Yes discard edge • No Add edge to MST and add all edges incident to w to Q, except {v,w}(assuming that w T) Problem: How to test whether v,w T.
A Fast MST Algorithm v • If v,w T, but {v,w} T, then both v and w have inserted edge {v,w} into Q • There are two copies of {v,w} in Q • They are consecutive • Perform two DELETEMIN operations • If {v,w} = {y,z}, discard both • Otherwise, add {v,w} to T and re-insert {y,z} w
A Fast MST Algorithm Analysis: • O(|V| + scan(|E|)) I/Os for retrieving adjacency lists • O(sort(|E|)) I/Os for priority queue operations Theorem:A MST of a graph G = (V,E) can be found in O(|V| + sort(|E|)) I/Os. Corollary:A MST of a graph G = (V,E) can be found in I/Os.
Graph Contraction and Sparse Graphs • A graph G = (V,E) is sparse if for any graph H obtainable from G through a series of edge contractions, |E(H)| = O(|V(H)|). • For a sparse graph, the number of vertices and edges in G reduces by a constant factor in each iteration of the connectivity and MST algorithms. Theorem:The connected components or a MST of a sparse graph with N vertices can be computed in O(sort(N)) I/Os.
Three Techniques for Graph Algorithms • Time-forward processing: • Express graph problems as evaluation problems of DAGs • Graph contraction: • Reduce the size of G while maintaining the properties of interest • Solve problem recursively on compressed graph • Construct solution for G from solution for compressed graph • Bootstrapping: • Switch to generally less efficient algorithm as soon as (part of the) input is small enough