Probabilistic Inference Lecture 7

Probabilistic InferenceLecture 7 M. Pawan Kumar pawan.kumar@ecp.fr Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/

Recap

Loopy Belief Propagation Initialize all messages to 1 In some order of edges, update messages Mab;k= Σiψa(li)ψab(li,lk)Πn≠bMna;i Not Guaranteed !! Until Convergence Rate of changes in messages < threshold

Loopy Belief Propagation B’a(i) = ψa(li)ΠnMna;i B’ab(i,j) = ψa(li)ψb(lj)ψab(li,lj)Πn≠bMna;iΠn≠aMnb;j Normalize to compute beliefs Ba(i), Bab(i,j) At convergence Σj Bab(i,j) = Ba(i)

Outline • Free Energy • Mean-Field Approximation • Bethe Approximation • Kikuchi Approximation Yedidia, Freeman and Weiss, 2000

Exponential Family P(v) = exp{-ΣaΣiθa;iIa;i(va) -Σa,bΣi,kθab;ikIab;ik(va,vb) - A(θ)} Πaψa(va) Π(a,b)ψab(va,vb) Probability P(v) = Z A(θ) : log Z ψa(li) : exp(-θa(i)) ψa(li,lk) : exp(-θab(i,k))

Exponential Family P(v) = exp{-ΣaΣiθa;iIa;i(va) -Σa,bΣi,kθab;ikIab;ik(va,vb) - A(θ)} Πaψa(va) Π(a,b)ψab(va,vb) exp(-Q(v)) Probability P(v) = = Z Z A(θ) : log Z ψa(li) : exp(-θa(i)) ψa(li,lk) : exp(-θab(i,k)) Energy Q(v) = Σaθa(va) + Σa,bθab(va,vb)

Exponential Family Πaψa(va) Π(a,b)ψab(va,vb) exp(-Q(v)) Probability P(v) = = Z Z Approximate probability distribution B(v) B(v) has a simpler form than P(v) Minimize KL divergence between B(v) and P(v)

Kullback-Leibler Divergence B(v) D = ΣvB(v) log P(v)

Kullback-Leibler Divergence D = ΣvB(v) log B(v) - ΣvB(v) log P(v)

Kullback-Leibler Divergence D = ΣvB(v) log B(v) + ΣvB(v) Q(v) - (- log Z) Helmholz free energy Constant with respect to B

Kullback-Leibler Divergence ΣvB(v) log B(v) + ΣvB(v) Q(v) Negative Entropy U(B)

Kullback-Leibler Divergence ΣvB(v) log B(v) + ΣvB(v) Q(v) Average Energy S(B)

Kullback-Leibler Divergence ΣvB(v) log B(v) + ΣvB(v) Q(v) Gibbs free energy

Outline • Free Energy • Mean-Field Approximation • Bethe Approximation • Kikuchi Approximation

Simpler Distribution One-node marginals Ba(i) Joint probability B(v) = Πa Ba(va)

Average Energy ΣvB(v) Q(v)

Average Energy * ΣvB(v)(Σaθa(va) + Σa,bθab(va,vb)) * = Simplify on board !!!

Average Energy ΣaΣiBa(i)θa(i) + Σa,bΣi,kBa(i)Bb(k)θab(i,k)

Negative Entropy * ΣvB(v) log (B(v))

Negative Entropy ΣaΣiBa(i)log(Ba(i))

Mean-Field Free Energy ΣaΣiBa(i)θa(i) + Σa,bΣi,kBa(i)Bb(k)θab(i,k) + ΣaΣiBa(i)log(Ba(i))

* Optimization Problem minB ΣaΣiBa(i)θa(i) + Σa,bΣi,kBa(i)Bb(k)θab(i,k) + ΣaΣiBa(i)log(Ba(i)) s.t. ΣiBa(i) = 1

KKT Condition log(Ba(i)) = -θa(i) -ΣbΣkBb(k)θab(i,k) + λa-1 Ba(i) = exp(-θa(i) -ΣbΣkBb(k)θab(i,k))/Za

Optimization Initialize Ba (random, uniform, domain knowledge) Set all random variables to unprocessed Pick an unprocessed random variable Va Ba(i) = exp(-θa(i) -ΣbΣkBb(k)θab(i,k))/Za If Ba changes, set neighbors to unprocessed Guaranteed !! Until Convergence Tutorial: Jaakkola, 2000 (one of several)

Simpler Distribution One-node marginals Ba(i) Two-node marginals Bab(i,k) Joint probability hard to write down But not for trees

Simpler Distribution One-node marginals Ba(i) Two-node marginals Bab(i,k) Πa,b Bab(va,vb) B(v) = Πa Ba(va)n(a)-1 n(a) = number of neighbors of Va Pearl, 1988

Average Energy ΣvB(v) Q(v)

Average Energy * ΣvB(v)(Σaθa(va) + Σa,bθab(va,vb))

Average Energy * ΣaΣiBa(i)θa(i) + Σa,bΣi,kBab(i,k)θab(i,k)

Average Energy -Σa(n(a)-1)ΣiBa(i)θa(i) + Σa,bΣi,kBab(i,k)(θa(i)+θb(k)+θab(i,k)) n(a) = number of neighbors of Va

Negative Entropy * ΣvB(v) log (B(v))

Negative Entropy -Σa(n(a)-1)ΣiBa(i)log(Ba(i)) + Σa,bΣi,kBab(i,k)log(Bab(i,k)) Exact for tree Approximate for general MRF

Bethe Free Energy -Σa(n(a)-1)ΣiBa(i)(θa(i)+log(Ba(i))) + Σa,bΣi,kBab(i,k)(θa(i)+θb(k)+θab(i,k)+log(Bab(i,k)) Exact for tree Approximate for general MRF

* Optimization Problem minB -Σa(n(a)-1)ΣiBa(i)(θa(i)+log(Ba(i))) + Σa,bΣi,kBab(i,k)(θa(i)+θb(k)+θab(i,k)+log(Bab(i,k)) s.t. ΣkBab(i,k) = Ba(i) Σi,kBab(i,k) = 1 ΣiBa(i) = 1

KKT Condition log(Bab(i,k)) = -(θa(i)+θb(k)+θab(i,k)) + λab(k) + λba(i) + μab - 1 λab(k) = log(Mab;k)

Optimization BP tries to optimize Bethe free energy But it may not converge Convergent alternatives exist Yuille and Rangarajan, 2003

Local Free Energy Cluster of variables c V1 V2 V4 V3 Gc = ΣvcBc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd)) G12 = Σv1,v2 B12(v1,v2)(log(B12(v1,v2)) + θ1(v1) + θ2(v2) + θ12(v1,v2))

Local Free Energy Cluster of variables c V1 V2 V4 V3 Gc = ΣvcBc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd)) G1 = Σv1 B1(v1)(log(B1(v1)) + θ1(v1))

Local Free Energy Cluster of variables c V1 V2 V4 V3 Gc = ΣvcBc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd)) G12 = Σv1,v2 B12(v1,v2)(log(B1234(v1,v2,v3,v4)) + θ1(v1) + θ2(v2) + θ3(v3) + θ4(v4) + θ12(v1,v2) + θ13(v1,v3) + θ24(v2,v4) + θ34(v3,v4))

Sum of Local Free Energies V1 V2 V4 V3 Sum of free energies of all pairwise clusters G12 + G13 + G24 + G34 Overcounts G1, G2, G3, G4 once !!!

Sum of Local Free Energies V1 V2 V4 V3 Sum of free energies of all pairwise clusters G12 + G13 + G24 + G34 - G1 - G2- G3 - G4

Sum of Local Free Energies V1 V2 V4 V3 Sum of free energies of all pairwise clusters G12 + G13 + G24 + G34 - G1 - G2- G3 - G4 Bethe Approximation !!!

Kikuchi Approximations V1 V2 V4 V3 Use bigger clusters G1234

Kikuchi Approximations V1 V2 V3 V5 V6 V4 Use bigger clusters G1245+ G2356 - G25 Derive message passing using KKT conditions!

Generalized Belief Propagation V1 V2 V3 V5 V6 V4 Use bigger clusters G1245+ G2356 - G25 Derive message passing using KKT conditions!

Probabilistic Inference Lecture 7

Probabilistic Inference Lecture 7

Presentation Transcript

Probabilistic Inference Lecture 3

Probabilistic Inference Lecture 2

Probabilistic Inference Lecture 4 – Part 1

Probabilistic Inference Lecture 5

Probabilistic Inference Lecture 6 – Part 2

Probabilistic Inference Lecture 4 – Part 2

Probabilistic inference

CPSC 7373: Artificial Intelligence Lecture 5: Probabilistic Inference

Probabilistic Inference Lecture 1

Probabilistic Inference

Lifted First-Order Probabilistic Inference

Probabilistic Inference in PRISM

Probabilistic Inference Lecture 6 – Part 1

On Distributing Probabilistic Inference

Probabilistic Inference in Distributed Systems

Probabilistic Inference

Probabilistic Inference: Conscious and Unconscious

Probabilistic Inference

First-Order Probabilistic Inference

Probabilistic Inference: Conscious and Unconscious