500 likes | 506 Views
This lecture introduces the concept of parallel graph algorithms and their implementation considerations for sequential graph programs. It covers operator formulation, scheduling, and delta computation, enabling multiple implementations suitable for different inputs and architectures.
E N D
Spring 2015Implementing ParallelGraph Algorithms Lecture 2: Introduction Roman Manevich Ben-Gurion University
Graph Algorithms are Ubiquitous Computer Graphics Computational biology Social Networks
Agenda Operator formulation of graph algorithms Implementation considerations for sequential graph programs Optimistic parallelization of graph algorithms Introduction to the Galois system
Main Idea • Define high-level abstraction of graph algorithms in terms of • Operator • Schedule • Delta • Given a new algorithm describe it in terms of composition of these elements • Enables many implementations • Find one suitable for typical input and architecture
Example: Single-Source Shortest-Path S 5 2 A B A 2 1 7 C C 3 4 3 12 D E 2 2 F 9 1 G if dist(A) + WAC < dist(C) dist(C) = dist(A) + WAC • Problem Formulation • Compute shortest distancefrom source node Sto every other node • Many algorithms • Bellman-Ford (1957) • Dijkstra (1959) • Chaotic relaxation (Miranker 1969) • Delta-stepping (Meyer et al. 1998) • Common structure • Each node has label distwith knownshortest distance from S • Key operation • relax-edge(u,v)
Dijkstra’s Algorithm <B,5> <C,3> <B,5> <E,6> <B,5> <D,7> S 5 2 A B 5 3 1 7 C 3 4 D E 7 2 2 6 F 9 1 G Scheduling of relaxations: • Use priority queueof nodes, ordered by label dist • Iterate over nodes u in priority order • On each step: relax all neighbors v of u • Apply relax-edgeto all (u,v)
Chaotic Relaxation S 5 2 • Scheduling of relaxations: • Use unordered set of edges • Iterate over edges (u,v) in any order • On each step: • Apply relax-edge to edge (u,v) A B 5 1 7 C 3 4 12 D E 2 2 F 9 1 G (C,D) (B,C) (S,A) (C,E)
Q = PQueue[Node] • Q.enqueue(S) • while Q ≠ ∅ { • u = Q.pop foreach (u,v,w) { if d(u) + w < d(v) d(v) := d(u) + w Q.enqueue(v) • } Algorithms as Scheduled Operators • W = Set[Edge] • W ∪= (S,y) : y ∈ Nbrs(S) • while W ≠ ∅ { • (u,v) = W.get if d(u) + w < d(v) d(v) := d(u) + w foreach y ∈ Nbrs(v) W.add(v,y) • } • Graph Algorithm = Operator(s) + Schedule • Dijkstra-style • Chaotic-Relaxation
Deconstructing Schedules Graph Algorithm How it should be done What should be done Operators Schedule Operator Delta Unordered/Ordered algorithms : activity • “TAO of parallelism” PLDI’11 Order activity processing Identify new activities Static Schedule Dynamic Schedule Priority in work queue Code structure(loops)
Q = PQueue[Node] • Q.enqueue(S) • while Q ≠ ∅ { • u = Q.pop foreach (u,v,w) { if d(u) + w < d(v) d(v) := d(u) + w Q.enqueue(v) • } • W = Set[Edge] • W ∪= (S,y) : y ∈ Nbrs(S) • while W ≠ ∅ { • (u,v) = W.get if d(u) + w < d(v) d(v) := d(u) + w foreach y ∈ Nbrs(v) W.add(v,y) • } Example • Graph • Algorithm • = • Operators • + • Schedule • Order activity processing • Identify new activities • Static • Dynamic • Chaotic-Relaxation • Dijkstra-style
SSSP in Elixir Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)] Graph type relax = [ nodes(node a, dist ad) nodes(node b, distbd) edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ] Operator Fixpoint Statement sssp = iterate relax ≫ schedule
Operators Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)] relax = [ nodes(node a, dist ad) nodes(node b, distbd) edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ] Redex pattern Guard Update sssp = iterate relax ≫ schedule ad bd ad ad+w w w a b a b if bd > ad + w
Fixpoint Statement Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)] relax = [ nodes(node a, dist ad) nodes(node b, distbd) edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Scheduling expression Apply operator until fixpoint
Scheduling Examples q = new PrQueue q.enqueue(SRC) while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) } } } Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)] relax = [ nodes(node a, dist ad) nodes(node b, distbd) edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ] sssp = iterate relax ≫ schedule Locality enhanced Label-correcting group b ≫unroll 2 ≫approx metric ad Dijkstra-style metric ad ≫group b
Operator Delta Inference Parallel Graph Algorithm Operators Schedule Order activity processing Identify new activities Static Schedule Dynamic Schedule
Problem Statement • Many graph programs have the formuntil no change do { apply operator} • Naïve implementation: keep looking for places where operator can be applied to make a change • Problem: too slow • Incremental implementation: after applying an operator, find smallest set of future active elements and schedule them (add to worklist)
Identifying the Delta of an Operator ? b relax1 ? a
Delta Inference Example c relax2 w2 a b w1 SMT Solver relax1 assume(da + w1< db) assume¬(dc + w2 < db) db_post =da + w1 assert¬(dc + w2 < db_post) SMT Solver (c,b) does not become active Query Program
Delta Inference Example – Active Apply relax on all outgoing edges (b,c) such that: dc > db +w2 and c ≄ a relax1 relax2 a b c w1 w2 SMT Solver assume(da + w1< db) assume¬(db+ w2 < dc) db_post =da + w1 assert¬(db_post+ w2< dc) SMT Solver Query Program
Influence Patterns d c a b=c a=d b a=c b a=c b=d a b=d d a=d b=c c
Example: Triangle Counting • How many triangles exist in a graph • Or for each node • Useful for estimating the community structure of a network
Triangles Pseudo-code • … • for a : nodesdo • for b : nodesdo • for c : nodesdo • … • if edges(a,b) • if edges(b,c) • if edges(c,a) • if a < b • if b < c • if a < c • triangles++ • fi • …
≺ • ≺ Example: Triangles • Iterators • Graph Conditions • Scalar Conditions • for a : nodesdo • for b : nodesdo • for c : nodesdo • if edges(a,b) • if edges(b,c) • if edges(c,a) • if a < b • if b < c • if a < c • triangles++ • fi • …
≺ • ≺ Triangles: Reordering • Iterators • Graph Conditions • Scalar Conditions • for a : nodesdo • for b : nodesdo • if edges(a,b) • if a < b • for c : nodesdo • if edges(b,c) • if a < c • if b < c • if edges(c,a) • triangles++ • fi • …
≺ • ≺ • for a : nodesdo • for x : nodesdo Triangles: Implementation Selection • Iterators • ifedges(x,y) • for b : Succ(a)do • Graph Conditions • ⇩ • if a < b • Scalar Conditions • for x : Succ(y)do • for c : Succ(b)do • if a < c • for a : nodesdo • if b < c • Reordering+ • ImplementationSelection • for b : nodesdo • if edges(c,a) • if edges(a,b) • triangles++ • if a < b • for c : nodesdo • fi • if edges(b,c) • … • if a < c • if b < c • if edges(c,a) • Tile: • triangles++ • fi • …
Parallelism is Everywhere Texas Advanced Computing Center Laptops Cell-phones
Minimum Spanning Tree Problem 7 1 6 c d e f 2 4 4 3 a b g 5
Minimum Spanning Tree Problem 7 1 6 c d e f 2 4 4 3 a b g 5
Boruvka’sMinimum Spanning Tree Algorithm 7 1 6 c d e f lt 2 4 4 3 1 6 a b g d e f 5 7 4 4 a,c b g 3 • Build MST bottom-up • repeat { • pick arbitrary node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a single node
Parallelism in Boruvka 7 1 6 c d e f 2 4 4 3 a b g 5 • Build MST bottom-up • repeat { • pick arbitrary node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a single node
Non-conflicting Iterations 7 1 6 c d e f 2 4 4 3 a b g 5 • Build MST bottom-up • repeat { • pick arbitrary node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a single node
Non-conflicting Iterations 1 6 f,g d e 7 4 a,c b 3 • Build MST bottom-up • repeat { • pick arbitrary node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a single node
Conflicting Iterations 7 1 6 c d e f 2 4 4 3 a b g 5 • Build MST bottom-up • repeat { • pick arbitrary node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a single node
How to parallelize graph algorithms The TAO of Parallelism in Graph Algorithms / PLDI 2011 Optimistic parallelization Implemented by the Galois system
Operator Formulation of Algorithms • Active element • Site where computation is needed • Operator • Computation at active element • Activity: application of operator to active element • Neighborhood • Set of nodes/edges read/written by activity • Distinct usually from neighbors in graph • Ordering : scheduling constraints on execution order of activities • Unordered algorithms: no semantic constraints but performance may depend on schedule • Ordered algorithms: problem-dependent order • Amorphous data-parallelism • Multiple active elements can be processed in parallel subject to neighborhood and ordering constraints What is that?Who implements it? : active node : neighborhood Parallel program = Operator + Schedule + Parallel data structure
Optimistic Parallelization in Galois i2 i1 i3 • Programming model • Client code has sequential semantics • Library of concurrent data structures • Parallel execution model • Activities executed speculatively • Runtime conflict detection • Each node/edge has associated exclusive lock • Graph operations acquire locks on read/written nodes/edges • Lock owned by another thread conflict iteration rolled back • All locks released at the end • Runtime book-keeping(source of overhead) • Locking • Undo actions
Cautious Operators • When an iteration aborts before completing its work we need to undo all of its changes • Log each change to the graph and upon abort apply reverse actions in reverse order • Expensive to maintain • Not supported by Galois systems for C++ • How can we avoid maintaining rollback data? • An operator is cautious if it never performs changes before acquiring all locks • In this case upon abort there are no changes to be undone • Can ensure operator is cautious by adding code to acquire locks before making any changes
Failsafe Points foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } foreach (Node a : wl) { … … } Lockset Grows Failsafe Lockset Stable … Program point Pis failsafe if: For every future program point Q – the locks set in Q is already contained in the locks set of P: Q : Reaches(P,Q) Locks(Q) ACQ(P)
Is this Code Cautious? foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } No Lockset Grows Failsafe Lockset Stable a … lt
Rewrite as Cautious Operator foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.neighbors(lt); g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } Lockset Grows Failsafe Lockset Stable a … lt
So far • Operator formulation of graph algorithms • Implementation considerations for sequential graph programs • Optimistic parallelization of graph algorithms • Introduction to the Galois system
Next steps • Divide into groups • Algorithm proposal • Due date: 15/4 • Phrase algorithm in terms of operator formulation • Define delta if necessary • Submit proposal with description of algorithm + pseudo-code • LaTeX template will be on web-site soon • Lecture on 15/4 on implementing your algorithm via Galois