I/O-Efficient Graph Algorithms

I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002

Motivation For theoreticians: • Graph problems are neat, often difficult, hence interesting For practitioners: • Massive graphs arise in GIS, web modelling, ... • Problems in computational geometry can be expressed as graph problems • Many abstract problems best viewed as graph problems • Extreme: Pointer-based data structures = graphs with extra information at their nodes

Outline Fundamental graph problems • List ranking • Algorithms for trees • Euler tour • Tree labelling • Graph searching • BFS/DFS • Connectivity • Connected components • Minimum spanning tree • Single source shortest paths

Outline • Techniques and data structures • Graph contraction • Time-forward processing • Tournament tree • Buffered repository tree • Lower bounds • List ranking • Connectivity • Planar graphs

Introduction and “Simple” Problems List ranking Euler tour Tree labelling Evaluating directed acyclic graphs Greedy graph algorithms

3 1 5 2 3 1 1 2 3 4 5 6 3 4 9 11 14 15 List Ranking

1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 Why Is List Ranking Non-Trivial? • The internal memory algorithm spends W(N) I/Os in the worst case.

3 1 5 2 3 1 3 1 5 2 3 1 3 1 7 4 3 4 11 15 3 4 9 11 14 15 An Efficient List Ranking Algorithm • Assume an independent set of size at least N/3 can be found efficiently (in O(sort(N)) I/Os).

An Efficient List Ranking Algorithm • Compressing L: • Sort elements in L \ I • Sort elements in I by their successor pointers • Scan the two lists to update the label of succ(v), for every element v  I • The I/O-complexity of this procedure is Theorem:A list of size N can be ranked in O(sort(N)) I/Os.

r The Euler Tour Technique Goal: Given a tree T, represent it by a list L so that certain computations on T can be performed by ranking L.

The Euler Tour Technique Theorem:Given the adjacency lists of the vertices in T, an Euler tour can be constructed in O(scan(N)) I/Os. • Let {v,w1},…,{v,wr} be the edgesincident to v • Then succ((wi,v)) = (v,wi+1)) w4 v w3 w1 w2

Rooting a Tree • Choosing a vertex r as the root of a tree T defines parent-child relationships between adjacent nodes • Rooting tree T =computing for every edge{v,w} who is the parentand who is the child • v = p(w) if and only ifrank((v,w)) < rank((w,v)) Theorem:A tree can be rooted in O(sort(N)) I/Os.

0 1 1 0 9 1 9 0 8 1 9 1 2 0 8 1 5 0 5 1 6 6 0 4 2 5 1 3 0 8 1 7 0 4 1 8 1 4 0 7 0 3 3 4 7 8 Computing a Preorder Numbering Theorem:A preorder numbering of a rooted tree T can be computed in O(sort(N)) I/Os. preorder#(v) = rank((p(v),v))

10 1 18 17 16 8 1 2 15 8 9 3 10 7 3 1 3 14 11 6 13 5 4 12 1 1 1 1 Computing Subtree Sizes Theorem:The nodes of T can be labelled with their subtree sizes in O(sort(N)) I/Os.

0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 Evaluating a Directed Acyclic Graph • More general: Given a labelling f, compute a labelling y so that y(v) is computed from f(v) and y(u1),…,y(ur), where u1,…,ur are v’s in-neighbors 0 0 0 1 0

Assume nodes are given in topologically sorted order. 0 0 0 1 6 0 12 10 1 1 1 1 1 2 5 7 1 1 9 11 0 0 0 0 8 4 0 3 Q: Time-Forward Processing • Use priority queue Q to send data along the edges. 0 0 0 0 0 0 1 1 1 1 1 0 0 0 (6,1,0) (6,5,1) (7,4,0) (7,5,1) (8,4,0) (8,5,1) (6,1,0) (6,5,1) (7,4,0) (7,5,1) (8,4,0) (8,5,1) (6,1,0) (7,4,0) (8,4,0) (5,2,1) (5,3,0) (6,1,0) (7,4,0) (8,4,0) (5,2,1) (5,3,0) (6,1,0) (7,4,0) (8,4,0) (4,2,1) (5,2,1) (6,1,0) (4,2,1) (4,3,0) (5,2,1) (5,3,0) (6,1,0) (4,2,1) (4,3,0) (5,2,1) (5,3,0) (6,1,0) (7,4,0) (7,5,1) (8,4,0) (8,5,1) (6,1,0) (5,2,1) (5,3,0) (6,1,0) (7,4,0) (7,5,1) (8,4,0) (8,5,1) (10,6,0) (8,4,0) (8,5,1) (9,7,1) (10,6,0) (10,7,1) (8,4,0) (8,5,1) (10,6,0) (8,4,0) (8,5,1) (9,7,1) (10,6,0) (10,7,1) (11,9,1) (11,10,0) (12,9,1) (12,10,0) (9,7,1) (10,6,0) (10,7,1) (9,7,1) (9,8,0) (10,6,0) (10,7,1) (9,7,1) (9,8,0) (10,6,0) (10,7,1) (10,6,0) (10,7,1) (10,6,0) (10,7,1) (11,9,1) (12,9,1) (10,6,0) (10,7,1) (11,9,1) (12,9,1) (11,9,1) (12,9,1) (11,9,1) (11,10,0) (12,9,1) (12,10,0) (7,4,0) (7,5,1) (8,4,0) (8,5,1) (10,6,0) (12,9,1) (12,10,0)

Time-Forward Processing Analysis: • Vertex set + adjacency lists scanned • O(scan(|V| + |E|)) I/Os • Priority queue: • Every edge inserted into and deleted from Q exactly once • O(|E|) priority queue operations • O(sort(|E|)) I/Os

Time-Forward Processing Analysis: • Vertex set + adjacency lists scanned • O(scan(|V| + |E|)) I/Os • Priority queue: • Every edge inserted into and deleted from Q exactly once • O(|E|) priority queue operations • O(sort(|E|)) I/Os Theorem:A directed acyclic graph G = (V,E) can be evaluated in O(sort(|V| + |E|)) I/Os.

Maximal Independent Set (MIS) Algorithm GREEDYMIS: 1. I  0 2. for every vertex v  G do 3. if no neighbor of v is in I then 4. Add v to I 5. end if 6. end for

Maximal Independent Set (MIS) Algorithm GREEDYMIS: 1. I  0 2. for every vertex v  G do 3. if no neighbor of v is in I then 4. Add v to I 5. end if 6. end for Observation:It suffices to consider all neighbors of v which have been visited in a previous iteration.

3 3 4 4 11 2 2 6 6 7 1 1 10 5 5 8 9 Maximal Independent Set (MIS)

3 3 3 4 4 4 11 11 2 2 2 6 6 6 7 7 7 1 1 10 10 10 5 5 5 8 8 8 9 9 9 Maximal Independent Set (MIS) Theorem:A maximal independent set of a graphG = (V,E) can be computed in O(sort(|V|+|E|)) I/Os. 11 1

Large Independent Set of a List Corollary:An independent set of size at least N/3 for a list L of size N can be found in O(sort(N)) I/Os. • Every vertex in an MIS I prevents two other vertices from being in I: • Every MIS has size at least N/3.

Graph Connectivity Connected components Minimum spanning tree

ConnectivityA Semi-External Algorithm

ConnectivityA Semi-External Algorithm Analysis: • Scan vertex set to load vertices into main memory • Scan edge set to carry out algorithm • O(scan(|V| + |E|)) I/Os Theorem:The connected components of a graph can be computed in O(scan(|V| + |E|)) I/Os, provided that |V|  M.

ConnectivityThe General Case Idea: • If |V|  M • Use semi-external algorithm • If |V| > M • Identify simple connected subgraphs of G • Contract these subgraphs to obtain graphG’ = (V’,E’) with |V’|  c|V|, c < 1 • Recursively compute connected components of G’ • Obtain labelling of connected components of G from labelling of components of G’

2 1 2 2 1 D 1 B 2 i C a h 2 g 2 e 1 d j 1 E 2 n b A 1 m l 2 2 f c D k 1 B C 2 E 1 A 2 ConnectivityThe General Case

ConnectivityThe General Case Main steps: • Find smallest neighbors (easy) • Compute connected components of graph H induced by selected edges • Contract each component into a single vertex (easy) • Call the procedure recursively • Copy label of every vertex v  G’ to all vertices in G represented by v (easy)

Every connected component of H has size at least 2 • |V’|  |V|/2 • recursive calls Theorem:The connected components of a graph G = (V,E) can be computed in I/Os. ConnectivityThe General Case

ConnectivityThe General Case • Later: BFS in O(|V| + sort(|E|)) I/Os • Can be used to identify connected components • When |V| = |E|/B, algorithm takes O(sort(|E|)) I/Os • Can stop recursion after recursive calls Theorem:The connected components of a graph G = (V,E) can be computed in I/Os.

i a h g e d j n b m l f c D k B C E A Minimum Spanning Tree (MST) Observation:Connectivity algorithm can be augmented to produce a spanning tree of G.

Minimum Spanning Tree (MST) To obtain a minimum spanning tree: • Choose edge of minimum weight incident to v • Some book-keeping: • The weight of an edge e in the compressed graph = the min weight of all edges represented by e • When “e is added” to T, add in fact this minimum edge a 1 d v 5 4 3 b c

Theorem:A MST of a graph G = (V,E) can be computed in I/Os. Minimum Spanning Tree (MST) i a h g e d j n b m l f c k D B C E A

A Fast MST Algorithm • Idea: • Assume MST can be computed inO(|V| + sort(|E|)) I/Os • Again recursion can be stopped afteriterations • Prim’s algorithm:

A Fast MST Algorithm • Maintain superset of blue edges in priority queue Q • When edge {v,w} of minimum weight is retrieved, test whether v,w are both in T • Yes  discard edge • No  Add edge to MST and add all edges incident to w to Q, except {v,w}(assuming that w  T) Problem: How to test whether v,w  T.

A Fast MST Algorithm v • If v,w  T, but {v,w}  T, then both v and w have inserted edge {v,w} into Q • There are two copies of {v,w} in Q • They are consecutive • Perform two DELETEMIN operations • If {v,w} = {y,z}, discard both • Otherwise, add {v,w} to T and re-insert {y,z} w

A Fast MST Algorithm Analysis: • O(|V| + scan(|E|)) I/Os for retrieving adjacency lists • O(sort(|E|)) I/Os for priority queue operations Theorem:A MST of a graph G = (V,E) can be found in O(|V| + sort(|E|)) I/Os. Corollary:A MST of a graph G = (V,E) can be found in I/Os.

Graph Contraction and Sparse Graphs • A graph G = (V,E) is sparse if for any graph H obtainable from G through a series of edge contractions, |E(H)| = O(|V(H)|). • For a sparse graph, the number of vertices and edges in G reduces by a constant factor in each iteration of the connectivity and MST algorithms. Theorem:The connected components or a MST of a sparse graph with N vertices can be computed in O(sort(N)) I/Os.

Three Techniques for Graph Algorithms • Time-forward processing: • Express graph problems as evaluation problems of DAGs • Graph contraction: • Reduce the size of G while maintaining the properties of interest • Solve problem recursively on compressed graph • Construct solution for G from solution for compressed graph • Bootstrapping: • Switch to generally less efficient algorithm as soon as (part of the) input is small enough

I/O-Efficient Graph Algorithms

I/O-Efficient Graph Algorithms

Presentation Transcript

Graph Algorithms Using Depth First Search

Basic Graph Algorithms

Parallel Algorithms III

Graph Algorithms in Bioinformatics

CS5234 Combinatorial and Graph Algorithms

Parallel Graph Algorithms

Chapter 7 Graphs and Graph Algorithms

Graph Algorithms

Introduction to Algorithms Graph Algorithms

Design Patterns for Efficient Graph Algorithms in MapReduce

Lecture 14: Graph Algorithms

CS38 Introduction to Algorithms

CPSC 411 Design and Analysis of Algorithms

Chapter 9 Graph algorithms

ADVANCED ALGORITHMS

Data Structures and Algorithms Graphs

Elementary Graph Algorithms

Graph Algorithms

Algorithm Animation for Bioinformatics Algorithms

Near Optimal Streaming algorithms for Graph Spanners

Elementary Graph Algorithms

Outlier Detection for Graph Data