1 / 41

Belief Propagation and its Generalizations

Belief Propagation and its Generalizations. Shane Oldenburger. Outline. The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation Kikuchi approximation Generalized BP. Outline. The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation

Download Presentation

Belief Propagation and its Generalizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Belief Propagation and its Generalizations Shane Oldenburger

  2. Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP

  3. Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP

  4. Recall from the Jointree Algorithm • We separate evidence e into: • e+: denotes evidence pertaining to ancestors • e-: denotes evidence pertaining to descendants • BEL(X) = P(X|e) = P(X|e+,e-) = P(e-|X,e+)*P(X|e+)/P(e-|e+) = P(e-|X)P(X|e+) = *(X)*(X) : messages from parents : messages from children : normalization constant

  5. Pearl’s Belief Propagation Algorithm:Initialization • Nodes with evidence • (xi) = 1 where xi = ei; 0 otherwise • (xi) = 1 where xi = ei; 0 otherwise • Nodes with no parents • (xi) = p(xi) //prior probabilities • Nodes with no children • (xi) = 1

  6. Pearl’s BP algorithm Iterate • For each X: • If all  messages from parents of X have arrived, combine into (X) • If all  messages from children of X have arrived, combine into (X) • If (X) has been computed and all  messages other than from Yi have arrived, calculate and send message XYi to child Yi • If (X) has been computed and all  messages other than from Ui have arrived, calculate and send message XUi to parent Ui Compute BEL(X) = *(X)*(X)

  7. Example of data propagation in a simple tree

  8. BP properties • Exact for Polytrees • Only one path between any two nodes • Each node X separates graph into two disjoint graphs (e+, e-) • But most graphs of interest are not Polytrees – what do we do? • Exact inference • Cutset conditioning • Jointree method • Approximate inference • Loopy BP

  9. Loopy BP • In the simple tree example, a finite number of messages where passed • In a graph with loops, messages may be passed around indefinitely • Stop when beliefs converge • Stop after some number of iterations • Loopy BP tends to achieve good empirical results • Low-level computer vision problems • Error-correcting codes: Turbocodes, Gallager codes

  10. Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP

  11. Markov Random Fields • BP algorithms have been developed for many graphical models • Pairwise Markov Random Fields are used in this paper for ease of presentation • An MRF consists of “observable” nodes and “hidden” nodes • Since it is pairwise, each observable node is connected to exactly one hidden node, and each hidden is connected to at most one observable node

  12. Markov Random Fields • Two hidden variables xi and xj are connected by a “compatibility function” ij(xi, yi) • Hidden variable xi is connected to observable variable yi by “evidence function” i(xi, yi) = xi(xi) • The joint probability for a pairwise MRF is p({x}) = (1/Z) ijij(xi, yi) ixi(xi) • The BP algorithm for pairwise MRFs is similar to that for Bayesian Networks

  13. Conversion between graphical models • We can limit ourselves to considering pairwise MRFs • Any pairwise MRF or BN can be converted to an equivalent “Factor graph” • Any factor graph can be converted into an equivalent pairwise MRF or BN

  14. An intermediary model • A factor graph is composed of • “variable” nodes represented by circles • “function” nodes represented by squares • Factor graphs are a generalization of Tanner graphs, where the “function” nodes are parity checks of its connected variables • A function node for a factor graph can be any arbitrary function of the variables connected to it

  15. From pairwise MRF to BN

  16. From BN to pairwise MRF

  17. Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP

  18. Gibbs Free Energy • Gibbs free energy is the difference in the energy of a system from an initial state to a final state of some process (e.g. chemical reaction) • For a chemical reaction, if the Gibbs free energy is negative then the reaction is “spontaneous”, or “allowed” • If the Gibbs free energy is non-negative, the reaction is “not allowed”

  19. Gibbs free energy • Instead of difference in energy of a chemical process, we want to define Gibbs free energy in term of the difference between a target probability distribution p and an approximate probability distribution b • Define the “distance” between p({x}) and b({x}) as • D(b({x}) || p({x})) = {x}b({x}) ln[b({x})/ p({x})] • This is known as the Kullback-Liebler distance • Boltzmann’s law: p({x}) = (1/Z) e-E({x})/T • Generally assumed by statistical physicists • Here we will use Boltzmann’s law as our definition of “energy” E • T acts as a unit scale parameter; let T = 1 • Substituting Boltzmann’s law into our distance measure • D(b({x}) || p({x})) = {x}b({x})E({x}) + {x}b({x})ln[b({x})] + ln Z

  20. Gibbs free energy • Our distance measure • D(b({x}) || p({x})) = {x}b({x})E({x}) + {x}b({x})ln[b({x})] + ln Z • We see will be zero (p = b) when • G(b({x})) = {x}b({x})E({x}) + {x}b({x})ln[b({x})] = U(b({x}) - S(b({x}) is minimized at F = -ln Z • G: “Gibbs free energy” • F: “Helmholz free energy” • U: “average energy” • S: “entropy”

  21. Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP

  22. Bethe approximation • We would like to derive Gibbs free energy in terms of one- and two-node beliefs bi and bij • Due to the pairwise nature of pairwise MRFs, bi and bij are sufficient to compute the average energy U • U = - ijbij(xi,xj)lnij(xi,xj) - ibi(xi)lni(xi) • The exact marginals probabilities pi and pij yeild the same form, so this average energy is exact if the one- and two-node beliefs are exact

  23. Bethe approximation • The entropy term is more problematic • Usually must settle for an approximation • Entropy can be computed exactly if it can be explicitly expressed in terms of one- and two-node beliefs • B({x}) = ij bij(xi,xj) / i bi(xi)qi-1 where qi = #neighbors of xi Then the Bethe approximation to entropy is • SBethe = ijxixjbij(xi,xj)lnbij(xi,xj) + (qi -1) xibi(xi)lnbi(xi) • For singly connected networks, this is exact and GBethe = U – SBethe corresponds to the exact marginal probabilities p • For graphs with loops, this is only an approximation (but usually a good one)

  24. Equivalence of BP and Bethe • The Bethe approximation is exact for pairwise MRF’s when the graphs contain no loops, so the Bethe free energy is minimal for the correct marginals • BP gives correct marginals when the graph contains no loops • Thus, when there are no loops, the BP beliefs are the global minima of the Bethe free energy • We can say more: a set of beliefs gives a BP fixed point in any graph iff they are local stationary points of the Bethe free energy • This can be shown by adding Lagrange multipliers to GBethe to enforce the marginalization constraints

  25. Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP

  26. Kikuchi approximation • Kikuchi approximation is an improvement on and generalization of Bethe • With this association between BP and the Bethe approximation to Gibbs free energy, can we use better approximation methods to craft better BP algorithms?

  27. Cluster variational method • Free energy approximated as a sum of local free energies of sets of regions of nodes • “Cluster variational method” provides a way to select the set of regions • Begin with a basic set of clusters including every interaction and node • Subtract the free energies of over-counted intersection regions • Add back over-counted intersections of intersections, etc. • Bethe is a Kikuchi approximation where the basic clusters are set to the set of all pairs of hidden nodes

  28. Cluster variational method • Bethe regions involve one or two nodes • Define local free energy of a single node Gi(bi(xi)) = xibi(xi)*ln(bi(xi) + Ei(xi)) • Define local free energy involving two nodes Gij(bi(xi,xj)=xi,xjbij(xi,xj)*ln(bij(xi,xj) + Eij(xi,xj)) • Then for the regions corresponding to Bethe, GBethe = G12 + G23 + G45 + G56 + G14 + G25 + G36 – G1 – G3 – G4 – G6 – 2G2 – 2G5

  29. Cluster variational method • For the Kikuchi example shown below, regions involve four nodes • Extend the same logic as before • Define local free energy involving four nodes e.g. G1245(b1245(x1,x2,x4,x5) = x1,x2,x4,x5 b1245(x1,x2,x4,x5)* ln(b1245(x1,x2,x4,x5) + E1245(x1,x2,x4,x5)) • Then for the Kikuchi regions shown, GKikuchi = G1245 + G2356 – G25

  30. A more general example • Now we have basic regions [1245], [2356], [4578], [5689] • Intersection regions [25], [45], [56], [58], and • Intersection of intersection region [5] • Then we have GKikuchi = G1245 + G2356 + G4578 + G5689 - G25 - G45 - G56 - G58 + G5

  31. Outline • The BP algorithm • MRFs – Markov Random Fields • Gibbs free energy • Bethe approximation • Kikuchi approximation • Generalized BP

  32. Generalized BP • We show how to construct a GBP algorithm for this example • First find the intersections, intersections of intersections, etc. of the basic clusters • Basic: [1245], [2356], [4578], [5689] • Intersections: [25], [45], [56], [58] • Intersection of intersections: [5]

  33. Region Graph • Next, organize regions into the region graph • A hierarchy of regions and their “direct” subregions • ”direct” subregions are subregions not contained in another subregion • e.g. [5] is a subregion of [1245], but is also a subregion of [25]

  34. Messages • Construct messages from all regions r to direct subregions s • These correspond to each edge of the region graph • Consider the message from region [1245] to subregion [25] • A message from nodes not in the subregion (1,4) to those in the subregion (2,5)  m1425

  35. Belief Equations • Construct belief equations for every region r • br({x}r) proportional to each compatibility matrix and evidence term completely contained in r • b5 = k[5][m25m45m65m85] • b45 = k[4545][m1245m7845m25m65m85] • b1245 = k[124512142545] [m3625m7845m65m85]

  36. Belief Equations • b5 = k[5][m25m45m65m85]

  37. Belief Equations • b45 = k[4545][m1245m7845m25m65m85]

  38. Belief Equations • b1245=k[124512142545][m3625m7845m65m85]

  39. Enforcing Marginalization • Now, we need to enforce the marginalization condition relating each pair of regions that share an edge in the hierarchy • e.g. between [5] and [45] b5(x5) = x4b45(x4, x5)

  40. Message Update • Adding the marginalization into the belief equations, we get the message update rule: m45(x5)  k x4,x2 4(x4)45(x4,x5)m1245(x4,x5)m7825(x2,x5) • The collection of belief equations and the message update rules define out GBP algorithm

  41. Complexity of GBP • Bad news: running time grows exponentially with the size of the basic clusters chosen • Good news: if the basic clusters encompass the shortest loops in the graphical model, usually nearly all the error from BP is eliminated • This usually requires only a small addition amount of computation than BP

More Related