200 likes | 348 Views
A Lock-free Multi-threaded Algorithm for the Max-flow Problem. Bo Hong Electrical and Computer Engineering Department Drexel University bohong@coe.drexel.edu http://www.ece.drexel.edu/faculty/bohong. The Max-flow Problem. 4. 6. 5. 8. 4. a. S. t. c. b. d. 7. 6. 4. 3. Find:
E N D
A Lock-free Multi-threaded Algorithm for the Max-flow Problem Bo Hong Electrical and Computer Engineering Department Drexel University bohong@coe.drexel.edu http://www.ece.drexel.edu/faculty/bohong
The Max-flow Problem 4 6 5 8 4 a S t c b d 7 6 4 3 • Find: • maximum flow from s to t • Subject to: • edge capacity constraints • zero net-flow for u є V- {s,t} Bo Hong
Existing algorithms • Sequential Algorithms • Augmenting Path • Ford-Fulkerson , pseudo-polynomial • Edmonds and Karp, O(|V|∙|E|2) • Dinitz, O(|V|2∙|E|) • Preflow Push • Karzanov, O(|V|3) • Push-Relabel • Goldberg, O(|V|2∙|E|), with dynamic trees O(|V| ∙ |E| ∙ log(|V|2∙|E|) ) • Parallel Algorithms • Shiloach, etc. O(|V|2 ∙log|V| ) with |V|-processor PRAM • Goldberg, O(|V|2 ∙log|V| ) with |V|-processor PRAM • Anderson, etc. Global relabeling • Bader, etc. Gap relabeling Bo Hong
Two vertex properties 3 1 a a b c b c t t S S d d 3 0 Excessive flow: the net flow into a vertex e.g. e(c) = 5 Every vertex has an integer valued height e.g. h(c) = 2 Bo Hong
Existing parallel algorithms a a b c b c t t S S d d • Push: • applicable when e(a)>0 and • there exists cf(a,v) > 0 and h(v)=h(a)-1 • Actions: • Lock a and v • a->v still pushable? • d = min( e(a), cf(a,v) ) • e(a) = e(a) – d • e(v) = e(v) + d • cf(a,v) = cf(a,v) – d • cf(v,a) = cf(v,a) + d • Unlock a and v • Lift: • applicable when e(c)>0 and • all cf(c,x) > 0 implies h(x) ≥ h(c) • Actions: • Lock v • v = lowest such vertex x • h(c) = h(v) + 1 • Unlock v Bo Hong
Impact of locking Locks protect shared accesses But locks are expensive • P1 • Lock • x ← x+1 • Unlock • P2 • Lock • x ← x+1 • Unlock l n 16 l 14 n l l n 12 n l Read x Increase 1 Update x Actual l 10 l T n l n Lock acquisition time (us) n 8 n l 6 n l n 4 Read x Increase 1 Update x l time n Ideal l l 2 n l u u u uuuuuuuuuuuu l n n n n s u l 0 3 5 7 9 11 13 15 Number of processors Bo Hong
New lock-free algoritm: model of the architecture • SMP computer with multiple processors sharing the memory • Multi-processor systems • Multi-core systems • Supports atomic ‘fetch-and-add’ instruction • Supports sequential consistency • P1 • x ← x+c1 • … • x ← x+c2 • P2 • x ← x+c3 • … • x ← x+c4 • Eventual result • x ← x+c1+c2+c3+c4 not matter how exactly the instructions were interleaved. Bo Hong
New algorithm: two basic lock-free operations a a b c b c t t S S d d • Push: • applicable when e(a)>0 and • there exists cf(a,x) > 0 and h(x)<h(a) • Actions: • v = lowest such vertex x • d = min( e(a), cf(a,v) ) • e(a) = e(a) – d • e(v) = e(v) + d • cf(a,v) = cf(a,v) – d • cf(v,a) = cf(v,a) + d • Lift: • applicable when e(c)>0 and • all cf(c,x) > 0 implies h(x) ≥ h(c) • Actions: • v = lowest such vertex x • h(c) = h(v) + 1 Bo Hong
The algorithm a b c t S d Initialize h(u), e(u), and f(u,v) h(s) = |V| h(u) = 0 for u є V – {s} f(s,u) = c(s,u) e(u) = c(s,u) f(u,v) = 0, otherwise While there exists applicable push or lift operations execute the push or lift operations asynchronously Bo Hong
Asynchronous execution of the basic operations • while e(u) > 0 • e’ = e(u) • h’ = ∞ • for each (u,v) s.t. cf(u, v) > 0 • if h(v) < h’ • h’ = h(v) • v’ = v • if h(u) > h’ • d = min ( e’, cf(u, v’) ) • cf(u, v’) = cf(u, v’) + d • cf(v’, u) = cf(v’, u) – d • e(u) = e(u) – d • e(v’) = e(v’) + d • else • h(u) = h’ + 1 P1 • while e(u) > 0 • e’ = e(u) • h’ = ∞ • for each (u,v) s.t. cf(u, v) > 0 • if h(v) < h’ • h’ = h(v) • v’ = v • if h(u) > h’ • d = min ( e’, cf(u, v’) ) • cf(u, v’) = cf(u, v’) + d • cf(v’, u) = cf(v’, u) – d • e(u) = e(u) – d • e(v’) = e(v’) + d • else • h(u) = h’ + 1 P2 Bo Hong
Seems rather chaotic? . . . Not really P1 P2 • while e(u) > 0 • e’ = e(u) • h’ = ∞ • for each (u,v) s.t. cf(u, v) > 0 • if h(v) < h’ • h’ = h(v) • v’ = v • if h(u) > h’ • d = min ( e’, cf(u, v’) ) • cf(u, v’) = cf(u, v’) + d • cf(v’, u) = cf(v’, u) – d • e(u) = e(u) – d • e(v’) = e(v’) + d • else • h(u) = h’ + 1 time or time Bo Hong
An invariant property of the algorithm As long as cf(u,v) and e(u) are updated atomically, we always have h(u) ≤ h(v) + 1 for any cf(u,v) > 0, no matter how the threads are interleaved. Bo Hong
Optimality of the algorithm • If any e(u) > 0, then the algorithm will not terminate Property of the push and lift operations • If the algorithm terminates, then there is no path from s to t in the residual graph Proof by contradiction, if such path exists, then the invariant property of function f has to be broken • If the algorithm terminates, it finds a maximum flow Termination implies all e(u)=0, meaning this is a feasible flow. No path from s to t, by max-flow min-cut theorem, it has to be a maximum flow Bo Hong
Convergence of the algorithm (complexity bound) • For any u s.t. e(u) > 0, there exists a path from u to s in the residual graph Property of network flow • The height of any vertex is less than 2|V| - 1 The longest path can have at most |V| vertices • The total number of lift operations is bound by 2|V|2-|V| Bound by the height of vertices • The total number of saturated pushes is bound by (2|V|-1)∙|E| Bound by the total number of lift operations • The total number of un-saturated pushes is bound by 4|V|2 ∙|E| Bound by the number of lift and saturated pushes • Therefore the algorithm terminates with O(|V|2 ∙|E|) operations Bo Hong
Lock-free termination detection • The algorithm terminates when e(u) = 0 for all u є V – {s,t} • e(u) = 0 at a single thread is insufficient to terminate the thread • An elegant solution: • The net flow out of source s decreases monotonically • The net flow into sink t increases monotonically • When the two values become equal, we must have e(u) = 0 for all u є V – {s,t}, a necessary and sufficient termination condition. Bo Hong
Experimental results • Execution results on 2-way SMP with 3.2GHz Intel Xeon Processors • 4-thread results obtained when hyper-threading was enabled Scalability of the Lock-Free Algorithm Comparison Against Classical Lock-Based Algorithm Bo Hong
Summary and future work • Developed a lock-free multi-threaded algorithm for the max-flow problem • having the same complexity bound as existing parallel algorithms • eliminated lock usages thereby improving thread-level parallelism • 20% improvement over existing lock-based parallel algorithms • Results indicate the effectiveness of algorithmic method in reducing synchronization overheads • Future work • Load balancing across the threads: vertex to thread assignment, static or dynamic or hybrid? • Optimize cache usages • Reduce the number of operations via global and gap relabling • What if edge capacities are floating-point? Bo Hong