480 likes | 670 Views
Probabilistic Inference Lecture 7. M. Pawan Kumar pawan.kumar@ecp.fr. Slides available online http:// cvc.centrale-ponts.fr /personnel/ pawan /. Recap. Loopy Belief Propagation. Initialize all messages to 1. In some order of edges, update messages.
E N D
Probabilistic InferenceLecture 7 M. Pawan Kumar pawan.kumar@ecp.fr Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/
Loopy Belief Propagation Initialize all messages to 1 In some order of edges, update messages Mab;k= Σiψa(li)ψab(li,lk)Πn≠bMna;i Not Guaranteed !! Until Convergence Rate of changes in messages < threshold
Loopy Belief Propagation B’a(i) = ψa(li)ΠnMna;i B’ab(i,j) = ψa(li)ψb(lj)ψab(li,lj)Πn≠bMna;iΠn≠aMnb;j Normalize to compute beliefs Ba(i), Bab(i,j) At convergence Σj Bab(i,j) = Ba(i)
Outline • Free Energy • Mean-Field Approximation • Bethe Approximation • Kikuchi Approximation Yedidia, Freeman and Weiss, 2000
Exponential Family P(v) = exp{-ΣaΣiθa;iIa;i(va) -Σa,bΣi,kθab;ikIab;ik(va,vb) - A(θ)} Πaψa(va) Π(a,b)ψab(va,vb) Probability P(v) = Z A(θ) : log Z ψa(li) : exp(-θa(i)) ψa(li,lk) : exp(-θab(i,k))
Exponential Family P(v) = exp{-ΣaΣiθa;iIa;i(va) -Σa,bΣi,kθab;ikIab;ik(va,vb) - A(θ)} Πaψa(va) Π(a,b)ψab(va,vb) exp(-Q(v)) Probability P(v) = = Z Z A(θ) : log Z ψa(li) : exp(-θa(i)) ψa(li,lk) : exp(-θab(i,k)) Energy Q(v) = Σaθa(va) + Σa,bθab(va,vb)
Exponential Family Πaψa(va) Π(a,b)ψab(va,vb) exp(-Q(v)) Probability P(v) = = Z Z Approximate probability distribution B(v) B(v) has a simpler form than P(v) Minimize KL divergence between B(v) and P(v)
Kullback-Leibler Divergence B(v) D = ΣvB(v) log P(v)
Kullback-Leibler Divergence D = ΣvB(v) log B(v) - ΣvB(v) log P(v)
Kullback-Leibler Divergence D = ΣvB(v) log B(v) + ΣvB(v) Q(v) - (- log Z) Helmholz free energy Constant with respect to B
Kullback-Leibler Divergence ΣvB(v) log B(v) + ΣvB(v) Q(v) Negative Entropy U(B)
Kullback-Leibler Divergence ΣvB(v) log B(v) + ΣvB(v) Q(v) Average Energy S(B)
Kullback-Leibler Divergence ΣvB(v) log B(v) + ΣvB(v) Q(v) Gibbs free energy
Outline • Free Energy • Mean-Field Approximation • Bethe Approximation • Kikuchi Approximation
Simpler Distribution One-node marginals Ba(i) Joint probability B(v) = Πa Ba(va)
Average Energy ΣvB(v) Q(v)
Average Energy * ΣvB(v)(Σaθa(va) + Σa,bθab(va,vb)) * = Simplify on board !!!
Average Energy ΣaΣiBa(i)θa(i) + Σa,bΣi,kBa(i)Bb(k)θab(i,k)
Negative Entropy * ΣvB(v) log (B(v))
Negative Entropy ΣaΣiBa(i)log(Ba(i))
Mean-Field Free Energy ΣaΣiBa(i)θa(i) + Σa,bΣi,kBa(i)Bb(k)θab(i,k) + ΣaΣiBa(i)log(Ba(i))
* Optimization Problem minB ΣaΣiBa(i)θa(i) + Σa,bΣi,kBa(i)Bb(k)θab(i,k) + ΣaΣiBa(i)log(Ba(i)) s.t. ΣiBa(i) = 1
KKT Condition log(Ba(i)) = -θa(i) -ΣbΣkBb(k)θab(i,k) + λa-1 Ba(i) = exp(-θa(i) -ΣbΣkBb(k)θab(i,k))/Za
Optimization Initialize Ba (random, uniform, domain knowledge) Set all random variables to unprocessed Pick an unprocessed random variable Va Ba(i) = exp(-θa(i) -ΣbΣkBb(k)θab(i,k))/Za If Ba changes, set neighbors to unprocessed Guaranteed !! Until Convergence Tutorial: Jaakkola, 2000 (one of several)
Outline • Free Energy • Mean-Field Approximation • Bethe Approximation • Kikuchi Approximation
Simpler Distribution One-node marginals Ba(i) Two-node marginals Bab(i,k) Joint probability hard to write down But not for trees
Simpler Distribution One-node marginals Ba(i) Two-node marginals Bab(i,k) Πa,b Bab(va,vb) B(v) = Πa Ba(va)n(a)-1 n(a) = number of neighbors of Va Pearl, 1988
Average Energy ΣvB(v) Q(v)
Average Energy * ΣvB(v)(Σaθa(va) + Σa,bθab(va,vb))
Average Energy * ΣaΣiBa(i)θa(i) + Σa,bΣi,kBab(i,k)θab(i,k)
Average Energy -Σa(n(a)-1)ΣiBa(i)θa(i) + Σa,bΣi,kBab(i,k)(θa(i)+θb(k)+θab(i,k)) n(a) = number of neighbors of Va
Negative Entropy * ΣvB(v) log (B(v))
Negative Entropy -Σa(n(a)-1)ΣiBa(i)log(Ba(i)) + Σa,bΣi,kBab(i,k)log(Bab(i,k)) Exact for tree Approximate for general MRF
Bethe Free Energy -Σa(n(a)-1)ΣiBa(i)(θa(i)+log(Ba(i))) + Σa,bΣi,kBab(i,k)(θa(i)+θb(k)+θab(i,k)+log(Bab(i,k)) Exact for tree Approximate for general MRF
* Optimization Problem minB -Σa(n(a)-1)ΣiBa(i)(θa(i)+log(Ba(i))) + Σa,bΣi,kBab(i,k)(θa(i)+θb(k)+θab(i,k)+log(Bab(i,k)) s.t. ΣkBab(i,k) = Ba(i) Σi,kBab(i,k) = 1 ΣiBa(i) = 1
KKT Condition log(Bab(i,k)) = -(θa(i)+θb(k)+θab(i,k)) + λab(k) + λba(i) + μab - 1 λab(k) = log(Mab;k)
Optimization BP tries to optimize Bethe free energy But it may not converge Convergent alternatives exist Yuille and Rangarajan, 2003
Outline • Free Energy • Mean-Field Approximation • Bethe Approximation • Kikuchi Approximation
Local Free Energy Cluster of variables c V1 V2 V4 V3 Gc = ΣvcBc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd)) G12 = Σv1,v2 B12(v1,v2)(log(B12(v1,v2)) + θ1(v1) + θ2(v2) + θ12(v1,v2))
Local Free Energy Cluster of variables c V1 V2 V4 V3 Gc = ΣvcBc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd)) G1 = Σv1 B1(v1)(log(B1(v1)) + θ1(v1))
Local Free Energy Cluster of variables c V1 V2 V4 V3 Gc = ΣvcBc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd)) G12 = Σv1,v2 B12(v1,v2)(log(B1234(v1,v2,v3,v4)) + θ1(v1) + θ2(v2) + θ3(v3) + θ4(v4) + θ12(v1,v2) + θ13(v1,v3) + θ24(v2,v4) + θ34(v3,v4))
Sum of Local Free Energies V1 V2 V4 V3 Sum of free energies of all pairwise clusters G12 + G13 + G24 + G34 Overcounts G1, G2, G3, G4 once !!!
Sum of Local Free Energies V1 V2 V4 V3 Sum of free energies of all pairwise clusters G12 + G13 + G24 + G34 - G1 - G2- G3 - G4
Sum of Local Free Energies V1 V2 V4 V3 Sum of free energies of all pairwise clusters G12 + G13 + G24 + G34 - G1 - G2- G3 - G4 Bethe Approximation !!!
Kikuchi Approximations V1 V2 V4 V3 Use bigger clusters G1234
Kikuchi Approximations V1 V2 V3 V5 V6 V4 Use bigger clusters G1245+ G2356 - G25 Derive message passing using KKT conditions!
Generalized Belief Propagation V1 V2 V3 V5 V6 V4 Use bigger clusters G1245+ G2356 - G25 Derive message passing using KKT conditions!