1 / 37

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers. Ozgur Sumer, U. Chicago Umut Acar , MPI-SWS Alexander Ihler , UC Irvine Ramgopal Mettu , UMass Amherst. Graphical models. Structured ( neg ) energy function Goal: Examples. Pairwise :. A. C. A. C. B. A.

ronald
Download Presentation

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers Ozgur Sumer, U. Chicago UmutAcar, MPI-SWS Alexander Ihler, UC Irvine RamgopalMettu, UMass Amherst

  2. Graphical models • Structured (neg) energy function • Goal: • Examples Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B

  3. Graphical models • Structured (neg) energy function • Goal: • Examples • Stereo depth Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B Stereo image pair MRF model Depth

  4. Graphical models • Structured (neg) energy function • Goal: • Examples • Stereo depth • Protein design & prediction Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B

  5. Graphical models • Structured (neg) energy function • Goal: • Examples • Stereo depth • Protein design & prediction • Weighted constraint satisfaction problems Pairwise: A C A C B A C Factor Graph Markov Random Field Bayesian Network B B

  6. Dual decomposition methods Original

  7. Dual decomposition methods • Decompose graph into smaller subproblems • Solve each independently; optimistic bound • Exact if all copies agree Original Decomposition

  8. Dual decomposition methods • Decompose graph into smaller subproblems • Solve each independently; optimistic bound • Exact if all copies agree • Enforce lost equality constraints via Langrange multipliers Original Decomposition

  9. Dual decomposition methods Same bound by different names • Dual decomposition (Komodakis et al. 2007) • TRW, MPLP (Wainwright et al. 2005; Globerson & Jaakkola 2007) • Soft arc consistency (Cooper & Schiex 2004) Original Decomposition

  10. Relaxed problems Energy MAP Consistent solutions Dual decomposition methods Original Decomposition

  11. Optimizing the bound Subgradient descent • Find each subproblem’s optimal configuration • Adjust entries for mis-matched solutions

  12. Optimizing the bound Subgradient descent • Find each subproblem’s optimal configuration • Adjust entries for mis-matched solutions

  13. Optimizing the bound Subgradient descent • Find each subproblem’s optimal configuration • Adjust entries for mis-matched solutions

  14. Equivalent decompositions • Any collection of tree-structured parts are equivalent • Two extreme cases • Set of all individual edges • Single “covering tree” of all edges; variables duplicated Original graph “Edges” Covering tree

  15. Speeding up inference • Parallel updates • Easy to perform subproblems in parallel (e.g. Komodakis et al. 2007) • Adaptive updates

  16. Some complications… • Example: Markov chain • Can pass messages in parallel, but… • If xn depends on x1, takes O(n) time anyway • Slow “convergence rate” • Larger problems are more “efficient” • Smaller problems are easily parallel & adaptive • Similar effects in message passing • Residual splash (Gonzales et al. 2009) x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

  17. Cluster trees • Alternative means of parallel computation • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) • Simple chain model • Normally, eliminate variables “in order” (DP) • Each calculation depends on all previous results x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

  18. Cluster trees • Alternative means of parallel computation • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) • Simple chain model • Normally, eliminate variables “in order” (DP) • Each calculation depends on all previous results x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

  19. Cluster trees • Alternative means of parallel computation • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) • Simple chain model • Normally, eliminate variables “in order” (DP) • Each calculation depends on all previous results x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

  20. Cluster trees • Alternative means of parallel computation • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) • Simple chain model • Normally, eliminate variables “in order” (DP) • Each calculation depends on all previous results x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

  21. Cluster trees • Alternative means of parallel computation • Eliminate variables in alternative order • Eliminate some intermediate (degree 2) nodes x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

  22. Cluster trees • Alternative means of parallel computation • Eliminate variables in alternative order • Eliminate some intermediate (degree 2) nodes • Balanced: depth log(n) x10 x5 x2 x6 x3 x8 x1x4 x7x9 x1---x2---x3--x4---x5---x6--x7--x8---x9---x10

  23. Adapting to changes x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

  24. Adapting to changes x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

  25. Adapting to changes x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

  26. Adapting to changes • 1st pass: update O(log n) cluster functions • 2nd pass: mark changed configurations, repeat decoding: O(m log n/m) x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 n = sequence length; m = # of changes

  27. Adapting to changes • 1st pass: update O(log n) cluster functions • 2nd pass: mark changed configurations, repeat decoding: O(m log n/m) x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 x1---x2---x3---x4---x5---x6---x7---x8---x9---x10 n = sequence length; m = # of changes

  28. Experiments • Random synthetic problems • Random, irregular but “grid-like” connectivity • Stereo depth images • Superpixel representation • Irregular graphs • Compare “edges” and “cover-tree” • 32-core Intel Xeon, Cilk++ implementation

  29. Synthetic problems • Larger problems improve convergence rate

  30. Synthetic problems Larger problems improve convergence rate Adaptivity helps significantly Cluster overhead

  31. Synthetic problems Larger problems improve convergence rate Adaptivity helps significantly Cluster overhead Parallelism

  32. Synthetic models • As a function of problem size

  33. Stereo depth

  34. Stereo depth

  35. Stereo depth

  36. Stereo depth • Time to convergence for different problems

  37. Conclusions • Fast methods for dual decomposition • Parallel computation • Adaptive updating • Subproblem choice • Small problems: highly parallel, easily adaptive • Large problems: better convergence rates • Cluster trees • Alternative form for parallel & adaptive updates • Benefits of both large & small subproblems

More Related