360 likes | 525 Views
Probabilistic Inference Lecture 6 – Part 1. M. Pawan Kumar pawan.kumar@ecp.fr. Slides available online http:// cvc.centrale-ponts.fr /personnel/ pawan /. Questions? Next Lecture !!. Tree Re-Weighted Message Passing (TRW) vs. Dual Decomposition (DD). Dual of the LP Relaxation.
E N D
Probabilistic InferenceLecture 6 – Part 1 M. Pawan Kumar pawan.kumar@ecp.fr Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/
Tree Re-Weighted Message Passing(TRW) vs. Dual Decomposition(DD)
Dual of the LP Relaxation Wainwright et al., 2001 q*(1) Va Vb Vc Va Vb Vc Vd Ve Vf q*(2) Vd Ve Vf Vg Vh Vi q*(3) Vg Vh Vi q*(4) q*(5) q*(6) Va Vb Vc Dual of LP Vd Ve Vf max q*(i) Vg Vh Vi i
TRW Initialize i. Take care of reparam constraint Choose random variable Va Compute min-marginals of Va for all trees Node-average the min-marginals REPEAT Kolmogorov, 2006
DD maxλi minx,xi ∑igi(xi) + ∑iλiT(xi-x) s.t.xi C KKT Condition: ∑iλi= 0
DD maxλi minx,xi ∑igi(xi) + ∑iλiTxi s.t.xi C
DD Initialize λi0= 0 Compute projected supergradients si= argminxi ∑i (gi(xi) + (λit)Txi) pi = si - ∑jsj/m REPEAT Update dual variables λit+1= λit + ηtpi Komodakis et al., 2007
TRW 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc 6.5 6.5 7
TRW 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc f1(a) = 0 f1(b) = 0 f2(b) = 0 f2(c) = 0 f3(c) = 0 f3(a) = 0 Strong Tree Agreement
DD 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc ya;0 ya;1 yb;0 yb;1 yc;0 yc;1 1 0 1 0 - - Optimal LP solution Values of yab;ik not shown. But we know yab;ik = ya;iyb;k
Supergradients 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc sa;0 sa;1 sb;0 sb;1 sc;0 sc;1 1 0 1 0 - - - - 1 0 1 0 1 0 - - 1 0
Projected Supergradients 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc pa;0 pa;1 pb;0 pb;1 pc;0 pc;1 0 0 0 0 - - - - 0 0 0 0 0 0 - - 0 0
Objective 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc 6.5 6.5 7 No further increase in dual objective
DD 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc 6.5 6.5 7 No further increase in dual objective Strong Tree Agreement implies DD stops
TRW 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc 4 0 4
TRW 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc f1(a) = 1 f1(b) = 1 f2(b) = 1 f2(c) = 0 f3(c) = 1 f3(a) = 1 f2(b) = 0 f2(c) = 1 Weak Tree Agreement
DD 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc ya;0 ya;1 yb;0 yb;1 yc;0 yc;1 0 1 0 1 - - Optimal LP solution Values of yab;ik not shown. But we know yab;ik = ya;iyb;k
Supergradients 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc sa;0 sa;1 sb;0 sb;1 sc;0 sc;1 0 1 0 1 - - - - 0 1 1 0 0 1 - - 0 1
Projected Supergradients 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc pa;0 pa;1 pb;0 pb;1 pc;0 pc;1 0 0 0 0 - - - - 0 0 0.5 -0.5 0 0 - - -0.5 0.5
Update with Learning Rate ηt = 1 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc pa;0 pa;1 pb;0 pb;1 pc;0 pc;1 0 0 0 0 - - - - 0 0 0.5 -0.5 0 0 - - -0.5 0.5
Objective 4 -2 2 0 1 -0.5 0.5 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0.5 8 -0.7 Vb Vc Va Va Vb Vc -0.5 4 4.3 Decrease in dual objective
Supergradients 4 -2 2 0 1 -0.5 0.5 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0.5 8 -0.7 Vb Vc Va Va Vb Vc sa;0 sa;1 sb;0 sb;1 sc;0 sc;1 0 1 0 1 - - - - 1 0 0 1 0 1 - - 1 0
Projected Supergradients 4 -2 2 0 1 -0.5 0.5 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0.5 8 -0.7 Vb Vc Va Va Vb Vc pa;0 pa;1 pb;0 pb;1 pc;0 pc;1 0 0 -0.5 0.5 - - - - 0.5 -0.5 -0.5 0.5 0 0 - - 0.5 -0.5
Update with Learning Rate ηt = 1/2 4 -2 2 0 1 -0.5 0.5 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0.5 8 -0.7 Vb Vc Va Va Vb Vc pa;0 pa;1 pb;0 pb;1 pc;0 pc;1 0 0 -0.5 0.5 - - - - 0.5 -0.5 -0.5 0.5 0 0 - - 0.5 -0.5
Updated Subproblems 4 -2 2.25 -0.25 1 -0.25 0.25 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0.25 1 -1 1.75 0.25 8 -0.45 Vb Vc Va Va Vb Vc
Objective 4 -2 2.25 -0.25 1 -0.25 0.25 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0.25 1 -1 1.75 0.25 8 -0.45 Vb Vc Va Va Vb Vc 0 4.25 4.25 Increase in dual objective DD goes beyond TRW
DD 4 -2 2.25 -0.25 1 -0.25 0.25 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0.25 1 -1 1.75 0.25 8 -0.45 Vb Vc Va Va Vb Vc 0 4.25 4.25 Increase in dual objective DD provides the optimal dual objective
Comparison TRW DD Fast Slow Local Maximum Global Maximum Requires MAP Estimate Requires Min-Marginals Also possible in the TRW framework Other forms of subproblems Tighter relaxations Sparse high-order potentials Easier in the DD framework
Subproblems Va Vb Vc Va Vb Vc Vd Ve Vf Vd Ve Vf Vg Vh Vi Vg Vh Vi Binary labeling problem Va Vb Vc Black edges submodular Vd Ve Vf Red edges supermodular Vg Vh Vi
Subproblems Va Vb Va Vb Vc Vd Ve Vf Vh Vi Vg Vh Vi Binary labeling problem Va Vb Vc Black edges submodular Vd Ve Vf Red edges supermodular Vg Vh Vi Remains submodular over iterations
Tighter Relaxations Va Vb Vb Vc Va Vb Vc Vd Ve Ve Vf Vd Ve Vf Vg Vh Vi Vd Ve Ve Vf Vg Vh Vh Vi Relaxation that is tight for the above 4-cycles LP-S + Cycle inequalities
High-Order Potentials Vb Vc Va Vb Vc Va Vb Ve Vf Vd Ve Vf Vd Ve Vg Vh Vi Vg Vh Vi Va Vd Ve Vf Vg Vh Vi
High-Order Potentials Vb Vc Ve Vf Value of Potential θc;y Labeling y for Clique O(h|C|)!! Subproblem: minyθc;y + λTy
Sparse High-Order Potentials Vb Vc Ve Vf Value of Potential θc;y Labeling y for Clique Σaya;0 = 0 Σaya;0 > 0 O(h|C|)!! Subproblem: minyθc;y + λTy
Sparse High-Order Potentials Many useful potentials are sparse Pn Potts Model Uniqueness constraints Covering constraints Pattern-based Potentials And now you can solve them efficiently !!