Finding Strongly Connected Components and Topological Sort in Parallel using O(log ² n) reachability queries

Finding Strongly Connected Components and Topological Sort in Parallel using O(log² n) reachability queries Warren Schudy Brown University Work done while interning at Google Research Mountain View

A Scheduling Problem • To-do List • Topological sort (TS) • Strongly connected components (SCC) • Reachability •  SCC application in scientific computing requiring parallelism [McLendon et al. 01]

Previous Results for TS, SCC and Reachability • Assume sparse graph with n vertices using 1 ≤ p ≤ n4/3 processors: • Question: is reachability fundamentally easier to parallelize than SCC and TS? (In Transactions of Information Processing Society of Japan ’99 ’04, Akio, Masahiro and Ryozo claim runtime n/p for TS & SCC)

Answer: no (up to logs) • Our main result: a reduction of SCC and TS to O(log2 n) reachability queries • Remainder of talk focuses on SCC problem

A simple SCC algorithm • Choose random vertex s V • Determine SCC(s) and output it • Determine the vertices Desc(s) reachable from s • Recurse (in parallel) on: • Desc(s) \ SCC(s) • V \ Desc(s) Desc(s) SCC(s) V \ Desc(s) s (Similar to [Coppersmith, Fleischer, Hendrickson and Pinar ’05])

High-runtime instance s Desc(s)

Algorithmic Idea • If this algorithm divided the graph roughly in two, recursion depth would be log n • So pick many source vertices instead of 1 (number chosen later) Desc( )

Outputting SCCs • Make one pivot vertex s special, and output its SCC SCC(s) s Recurse on blue, green, yellow and unreached subgraphs Desc(s) Desc( )

Our Multipivot Algorithm • Permute the vertices randomly • Determine the smallest s s.t. {1,2,…s} together reach at least half the edges (binary search) • Output SCC(s) • Recurse on: • V \ Desc(1…s) • (Desc(1…s-1)Desc(s)) \ SCC(s) • Desc(1…s-1) \ Desc(s) • Desc(s) \ (Desc(1…s-1)SCC(s)) SCC(s) s= 4 3 1 2 Desc(s) Desc(1…s-1)

Runtime Analysis SCC(s) s= 4 k 3 1 2 Desc(s) Desc(1…s-1) k2 Each contains less than half the edges by definition of s May contain almost all the edges, but will contain only some of the transitive closure edges (due to random order)

Key Lemma 9 • Number of edges in transitive closure of Desc(s) \ (Desc(1…s-1)SCC(s)) is at most 3/4 that of the parent subgraph V • Correction for proof of Lemma 10:"vertices strictly between g(v) and v"other than v after

Open question • Are there better parallel algorithms for reachability? • E.g. can reachability on a 3-regular digraph be computed in o(√n) time using n processors? Ullman & Yannakakis takes O~(√n) time.

Acknowledgements & Questions • D. Sivakumar • Claire Mathieu • Maurice Herlihy • Glencora Borradaile

**Extra slides**

Combining Reachability Queries on subgraphs Super-source Sources Subgraph 1 Subgraph 2

Finding Strongly Connected Components and Topological Sort in Parallel using O(log ² n) reachability queries