370 likes | 500 Views
Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers. Ozgur Sumer, U. Chicago Umut Acar , MPI-SWS Alexander Ihler , UC Irvine Ramgopal Mettu , UMass Amherst. Graphical models. Structured ( neg ) energy function Goal: Examples. Pairwise :. A. C. A. C. B. A.
E N D
Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers Ozgur Sumer, U. Chicago UmutAcar, MPI-SWS Alexander Ihler, UC Irvine RamgopalMettu, UMass Amherst
Graphical models • Structured (neg) energy function • Goal: • Examples Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B
Graphical models • Structured (neg) energy function • Goal: • Examples • Stereo depth Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B Stereo image pair MRF model Depth
Graphical models • Structured (neg) energy function • Goal: • Examples • Stereo depth • Protein design & prediction Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B
Graphical models • Structured (neg) energy function • Goal: • Examples • Stereo depth • Protein design & prediction • Weighted constraint satisfaction problems Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B
Dual decomposition methods Original
Dual decomposition methods • Decompose graph into smaller subproblems • Solve each independently; optimistic bound • Exact if all copies agree Original Decomposition
Dual decomposition methods • Decompose graph into smaller subproblems • Solve each independently; optimistic bound • Exact if all copies agree • Enforce lost equality constraints via Langrange multipliers Original Decomposition
Dual decomposition methods Same bound by different names • Dual decomposition (Komodakis et al. 2007) • TRW, MPLP (Wainwright et al. 2005; Globerson & Jaakkola 2007) • Soft arc consistency (Cooper & Schiex 2004) Original Decomposition
Relaxed problems Energy MAP Consistent solutions Dual decomposition methods Original Decomposition
Optimizing the bound Subgradient descent • Find each subproblem’s optimal configuration • Adjust entries for mis-matched solutions
Optimizing the bound Subgradient descent • Find each subproblem’s optimal configuration • Adjust entries for mis-matched solutions
Optimizing the bound Subgradient descent • Find each subproblem’s optimal configuration • Adjust entries for mis-matched solutions
Equivalent decompositions • Any collection of tree-structured parts are equivalent • Two extreme cases • Set of all individual edges • Single “covering tree” of all edges; variables duplicated Original graph “Edges” Covering tree
Speeding up inference • Parallel updates • Easy to perform subproblems in parallel (e.g. Komodakis et al. 2007) • Adaptive updates
Some complications… • Example: Markov chain • Can pass messages in parallel, but… • If xn depends on x1, takes O(n) time anyway • Slow “convergence rate” • Larger problems are more “efficient” • Smaller problems are easily parallel & adaptive • Similar effects in message passing • Residual splash (Gonzales et al. 2009) x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Cluster trees • Alternative means of parallel computation • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) • Simple chain model • Normally, eliminate variables “in order” (DP) • Each calculation depends on all previous results x1--x2---x3---x4---x5--x6---x7----x8---x9--x10
Cluster trees • Alternative means of parallel computation • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) • Simple chain model • Normally, eliminate variables “in order” (DP) • Each calculation depends on all previous results x1--x2---x3---x4---x5--x6---x7----x8---x9--x10
Cluster trees • Alternative means of parallel computation • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) • Simple chain model • Normally, eliminate variables “in order” (DP) • Each calculation depends on all previous results x1--x2---x3---x4---x5--x6---x7----x8---x9--x10
Cluster trees • Alternative means of parallel computation • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) • Simple chain model • Normally, eliminate variables “in order” (DP) • Each calculation depends on all previous results x1--x2---x3---x4---x5--x6---x7----x8---x9--x10
Cluster trees • Alternative means of parallel computation • Eliminate variables in alternative order • Eliminate some intermediate (degree 2) nodes x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Cluster trees • Alternative means of parallel computation • Eliminate variables in alternative order • Eliminate some intermediate (degree 2) nodes • Balanced: depth log(n) x10 x5 x2 x6 x3 x8 x1x4 x7x9 x1---x2---x3--x4---x5---x6--x7--x8---x9---x10
Adapting to changes x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Adapting to changes x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Adapting to changes x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10
Adapting to changes • 1st pass: update O(log n) cluster functions • 2nd pass: mark changed configurations, repeat decoding: O(m log n/m) x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 n = sequence length; m = # of changes
Adapting to changes • 1st pass: update O(log n) cluster functions • 2nd pass: mark changed configurations, repeat decoding: O(m log n/m) x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 n = sequence length; m = # of changes
Experiments • Random synthetic problems • Random, irregular but “grid-like” connectivity • Stereo depth images • Superpixel representation • Irregular graphs • Compare “edges” and “cover-tree” • 32-core Intel Xeon, Cilk++ implementation
Synthetic problems • Larger problems improve convergence rate
Synthetic problems Larger problems improve convergence rate Adaptivity helps significantly Cluster overhead
Synthetic problems Larger problems improve convergence rate Adaptivity helps significantly Cluster overhead Parallelism
Synthetic models • As a function of problem size
Stereo depth • Time to convergence for different problems
Conclusions • Fast methods for dual decomposition • Parallel computation • Adaptive updating • Subproblem choice • Small problems: highly parallel, easily adaptive • Large problems: better convergence rates • Cluster trees • Alternative form for parallel & adaptive updates • Benefits of both large & small subproblems