Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers Ozgur Sumer, U. Chicago UmutAcar, MPI-SWS Alexander Ihler, UC Irvine RamgopalMettu, UMass Amherst

Graphical models • Structured (neg) energy function • Goal: • Examples Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B

Graphical models • Structured (neg) energy function • Goal: • Examples • Stereo depth Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B Stereo image pair MRF model Depth

Graphical models • Structured (neg) energy function • Goal: • Examples • Stereo depth • Protein design & prediction Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B

Graphical models • Structured (neg) energy function • Goal: • Examples • Stereo depth • Protein design & prediction • Weighted constraint satisfaction problems Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B

Dual decomposition methods Original

Dual decomposition methods • Decompose graph into smaller subproblems • Solve each independently; optimistic bound • Exact if all copies agree Original Decomposition

Dual decomposition methods • Decompose graph into smaller subproblems • Solve each independently; optimistic bound • Exact if all copies agree • Enforce lost equality constraints via Langrange multipliers Original Decomposition

Dual decomposition methods Same bound by different names • Dual decomposition (Komodakis et al. 2007) • TRW, MPLP (Wainwright et al. 2005; Globerson & Jaakkola 2007) • Soft arc consistency (Cooper & Schiex 2004) Original Decomposition

Relaxed problems Energy MAP Consistent solutions Dual decomposition methods Original Decomposition

Optimizing the bound Subgradient descent • Find each subproblem’s optimal configuration • Adjust entries for mis-matched solutions

Equivalent decompositions • Any collection of tree-structured parts are equivalent • Two extreme cases • Set of all individual edges • Single “covering tree” of all edges; variables duplicated Original graph “Edges” Covering tree

Speeding up inference • Parallel updates • Easy to perform subproblems in parallel (e.g. Komodakis et al. 2007) • Adaptive updates

Some complications… • Example: Markov chain • Can pass messages in parallel, but… • If xn depends on x1, takes O(n) time anyway • Slow “convergence rate” • Larger problems are more “efficient” • Smaller problems are easily parallel & adaptive • Similar effects in message passing • Residual splash (Gonzales et al. 2009) x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Cluster trees • Alternative means of parallel computation • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) • Simple chain model • Normally, eliminate variables “in order” (DP) • Each calculation depends on all previous results x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

Cluster trees • Alternative means of parallel computation • Eliminate variables in alternative order • Eliminate some intermediate (degree 2) nodes x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Cluster trees • Alternative means of parallel computation • Eliminate variables in alternative order • Eliminate some intermediate (degree 2) nodes • Balanced: depth log(n) x10 x5 x2 x6 x3 x8 x1x4 x7x9 x1---x2---x3--x4---x5---x6--x7--x8---x9---x10

Adapting to changes x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Adapting to changes • 1st pass: update O(log n) cluster functions • 2nd pass: mark changed configurations, repeat decoding: O(m log n/m) x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 n = sequence length; m = # of changes

Experiments • Random synthetic problems • Random, irregular but “grid-like” connectivity • Stereo depth images • Superpixel representation • Irregular graphs • Compare “edges” and “cover-tree” • 32-core Intel Xeon, Cilk++ implementation

Synthetic problems • Larger problems improve convergence rate

Synthetic problems Larger problems improve convergence rate Adaptivity helps significantly Cluster overhead

Synthetic problems Larger problems improve convergence rate Adaptivity helps significantly Cluster overhead Parallelism

Synthetic models • As a function of problem size

Stereo depth

Stereo depth • Time to convergence for different problems

Conclusions • Fast methods for dual decomposition • Parallel computation • Adaptive updating • Subproblem choice • Small problems: highly parallel, easily adaptive • Large problems: better convergence rates • Cluster trees • Alternative form for parallel & adaptive updates • Benefits of both large & small subproblems

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Presentation Transcript

Domain decomposition in parallel computing

Parallel and Distributed Graph Cuts by Dual Decomposition

Parallel FFT and Use in PDE Solvers

Fast Parallel Grid Remapping for Unstructured and Structured Grids

Fast Adaptive Hybrid Mesh Generation Based on Quad-tree Decomposition

Parallel Decomposition Methods

Infrastructure for Parallel Adaptive Unstructured Mesh Simulations

Lecture 8 Hybrid Solvers based on Domain Decomposition

Adaptive Robust Control For Dual Stage Hard Drives

AP, IB and Dual Credit updates

Fast Parallel Grid Remapping for Unstructured and Structured Grids

Fast Adaptive Storage and Retrieval

MAKING MINIMAL SOLVERS FAST

Domain Decomposition and Parallel Finite Element Methods

Adaptive Multi-levels Dictionaries and Singular Value Decomposition Techniques for

Planning for Fast Connectivity Updates

Towards Adaptive Caching for Parallel and Distributed Simulation

Webinar on Dual Credit/Dual Enrollment Updates

Parallel Decomposition