Probabilistic Inference Lecture 6 – Part 1

Probabilistic InferenceLecture 6 – Part 1 M. Pawan Kumar pawan.kumar@ecp.fr Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/

Questions?Next Lecture !!

Tree Re-Weighted Message Passing(TRW) vs. Dual Decomposition(DD)

Dual of the LP Relaxation Wainwright et al., 2001 q*(1) Va Vb Vc Va Vb Vc Vd Ve Vf q*(2) Vd Ve Vf Vg Vh Vi q*(3) Vg Vh Vi q*(4) q*(5) q*(6)  Va Vb Vc Dual of LP Vd Ve Vf max  q*(i) Vg Vh Vi  i  

TRW Initialize i. Take care of reparam constraint Choose random variable Va Compute min-marginals of Va for all trees Node-average the min-marginals REPEAT Kolmogorov, 2006

DD maxλi minx,xi ∑igi(xi) + ∑iλiT(xi-x) s.t.xi  C KKT Condition: ∑iλi= 0

DD maxλi minx,xi ∑igi(xi) + ∑iλiTxi s.t.xi  C

DD Initialize λi0= 0 Compute projected supergradients si= argminxi ∑i (gi(xi) + (λit)Txi) pi = si - ∑jsj/m REPEAT Update dual variables λit+1= λit + ηtpi Komodakis et al., 2007

TRW 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc 6.5 6.5 7

TRW 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc f1(a) = 0 f1(b) = 0 f2(b) = 0 f2(c) = 0 f3(c) = 0 f3(a) = 0 Strong Tree Agreement

DD 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc ya;0 ya;1 yb;0 yb;1 yc;0 yc;1 1 0 1 0 - - Optimal LP solution Values of yab;ik not shown. But we know yab;ik = ya;iyb;k

Supergradients 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc sa;0 sa;1 sb;0 sb;1 sc;0 sc;1 1 0 1 0 - - - - 1 0 1 0 1 0 - - 1 0

Projected Supergradients 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc pa;0 pa;1 pb;0 pb;1 pc;0 pc;1 0 0 0 0 - - - - 0 0 0 0 0 0 - - 0 0

Objective 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc 6.5 6.5 7 No further increase in dual objective

DD 7.5 -7.5 8.75 8.75 -5 6 6 -3 7.5 l1 1 -5.5 -3 -1 -3 -7 l0 7 -7 6.5 -3 3 -3 6.5 3 7 Vb Vc Va Va Vb Vc 6.5 6.5 7 No further increase in dual objective Strong Tree Agreement implies DD stops

TRW 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc 4 0 4

TRW 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc f1(a) = 1 f1(b) = 1 f2(b) = 1 f2(c) = 0 f3(c) = 1 f3(a) = 1 f2(b) = 0 f2(c) = 1 Weak Tree Agreement

DD 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc ya;0 ya;1 yb;0 yb;1 yc;0 yc;1 0 1 0 1 - - Optimal LP solution Values of yab;ik not shown. But we know yab;ik = ya;iyb;k

Supergradients 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc sa;0 sa;1 sb;0 sb;1 sc;0 sc;1 0 1 0 1 - - - - 0 1 1 0 0 1 - - 0 1

Projected Supergradients 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc pa;0 pa;1 pb;0 pb;1 pc;0 pc;1 0 0 0 0 - - - - 0 0 0.5 -0.5 0 0 - - -0.5 0.5

Update with Learning Rate ηt = 1 4 -2 2 0 1 0 0 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0 8 -0.2 Vb Vc Va Va Vb Vc pa;0 pa;1 pb;0 pb;1 pc;0 pc;1 0 0 0 0 - - - - 0 0 0.5 -0.5 0 0 - - -0.5 0.5

Objective 4 -2 2 0 1 -0.5 0.5 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0.5 8 -0.7 Vb Vc Va Va Vb Vc -0.5 4 4.3 Decrease in dual objective

Supergradients 4 -2 2 0 1 -0.5 0.5 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0.5 8 -0.7 Vb Vc Va Va Vb Vc sa;0 sa;1 sb;0 sb;1 sc;0 sc;1 0 1 0 1 - - - - 1 0 0 1 0 1 - - 1 0

Projected Supergradients 4 -2 2 0 1 -0.5 0.5 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0.5 8 -0.7 Vb Vc Va Va Vb Vc pa;0 pa;1 pb;0 pb;1 pc;0 pc;1 0 0 -0.5 0.5 - - - - 0.5 -0.5 -0.5 0.5 0 0 - - 0.5 -0.5

Update with Learning Rate ηt = 1/2 4 -2 2 0 1 -0.5 0.5 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0 1 -1 2 0.5 8 -0.7 Vb Vc Va Va Vb Vc pa;0 pa;1 pb;0 pb;1 pc;0 pc;1 0 0 -0.5 0.5 - - - - 0.5 -0.5 -0.5 0.5 0 0 - - 0.5 -0.5

Updated Subproblems 4 -2 2.25 -0.25 1 -0.25 0.25 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0.25 1 -1 1.75 0.25 8 -0.45 Vb Vc Va Va Vb Vc

Objective 4 -2 2.25 -0.25 1 -0.25 0.25 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0.25 1 -1 1.75 0.25 8 -0.45 Vb Vc Va Va Vb Vc 0 4.25 4.25 Increase in dual objective DD goes beyond TRW

DD 4 -2 2.25 -0.25 1 -0.25 0.25 0 4 l1 0 -1 0 1 -1 0 l0 8 -2 0.25 1 -1 1.75 0.25 8 -0.45 Vb Vc Va Va Vb Vc 0 4.25 4.25 Increase in dual objective DD provides the optimal dual objective

Comparison TRW DD Fast Slow Local Maximum Global Maximum Requires MAP Estimate Requires Min-Marginals Also possible in the TRW framework Other forms of subproblems Tighter relaxations Sparse high-order potentials Easier in the DD framework

Subproblems Va Vb Vc Va Vb Vc Vd Ve Vf Vd Ve Vf Vg Vh Vi Vg Vh Vi Binary labeling problem Va Vb Vc Black edges submodular Vd Ve Vf Red edges supermodular Vg Vh Vi

Subproblems Va Vb Va Vb Vc Vd Ve Vf Vh Vi Vg Vh Vi Binary labeling problem Va Vb Vc Black edges submodular Vd Ve Vf Red edges supermodular Vg Vh Vi Remains submodular over iterations

Tighter Relaxations Va Vb Vb Vc Va Vb Vc Vd Ve Ve Vf Vd Ve Vf Vg Vh Vi Vd Ve Ve Vf Vg Vh Vh Vi Relaxation that is tight for the above 4-cycles LP-S + Cycle inequalities

High-Order Potentials Vb Vc Va Vb Vc Va Vb Ve Vf Vd Ve Vf Vd Ve Vg Vh Vi Vg Vh Vi Va Vd Ve Vf Vg Vh Vi

High-Order Potentials Vb Vc Ve Vf Value of Potential θc;y Labeling y for Clique O(h|C|)!! Subproblem: minyθc;y + λTy

Sparse High-Order Potentials Vb Vc Ve Vf Value of Potential θc;y Labeling y for Clique Σaya;0 = 0 Σaya;0 > 0 O(h|C|)!! Subproblem: minyθc;y + λTy

Sparse High-Order Potentials Many useful potentials are sparse Pn Potts Model Uniqueness constraints Covering constraints Pattern-based Potentials And now you can solve them efficiently !!

Probabilistic Inference Lecture 6 – Part 1

Probabilistic Inference Lecture 6 – Part 1

Presentation Transcript

Inference for proportions - Inference for a single proportion

Lecture 3 Empirical Bayes and Proc Mixed

Lecture 4

Statistical Inference and Regression Analysis: GB.3302.30

Foundations of Probabilistic Answers to Queries

Inference in First Order Logic

Statistical Inference for Two Samples

Modeling Users and Content : Structured Probabilistic Representation and Scalable Online Inference Algorithms

Some Useful Machine Learning Tools

Statistical Inference and Regression Analysis: GB.3302.30

Lecture 5: Parallel Tools Landscape – Part 2

Semantic Inference for Question Answering

Discrete Mathematics Lecture 2.

Statistical Relational Learning

Experiment 10 Group III Cation Analysis Part II

Lecture 3 what Genes are and What they do

Lecture 01 – Part C Constraint Satisfaction Problems

Inference in Bayesian Networks

Lecture 3 what Genes are and What they do

Fundamentals of Bayesian Inference

Lecture 01 – Part C Constraint Satisfaction Problems