Belief Propagation in a Continuous World

Belief Propagation in a Continuous World Andrew Frank 11/02/2009 Joint work with Alex Ihler and Padhraic Smyth TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A

Graphical Models • Nodes represent random variables. • Edges represent dependencies. A C A C B A C B B

Markov Random Fields B D B D A C E B D A  C | B B  E | C, D A C E A C E

Factoring Probability Distributions Independence relations  factorization C A B D p(A,B,C,D) = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D)

Toy Example: A Day in Court G I I A G V E G I W A, E, W є {“Innocent”, “Guilty”} V є {“Not guilty verdict”, “Guilty verdict”}

Inference • Most probable explanation: • Marginalization:

Iterative Message Updates x

Belief Propagation mAE(E) mEV(V) A V E mWE(E) W

Loopy BP A A B B D D C C Does this work? Does it make any sense?

A Variational Perspective • Reformulate the problem: “Tractable” distributions Best tractable approximation, Q True distribution, P Find Q to minimize the divergence.

Choose an Approximating Family • Desired traits: • Simple enough to enable easy computation • Complex enough to represent P e.g. Fully factored: Structured:

Choose a Divergence Measure Common choices: • Kullback-Liebler divergence: • Alpha divergence:

Behavior of α-Divergence Source: T. Minka. Divergence measures and message passing. Technical Report MSR-TR-2005-173, Microsoft. Research, 2005.

Resulting Algorithms Assuming a fully-factored form of Q, we get…* • Mean field, α = 0 • Belief propagation, α = 1 • Tree-reweighted BP, α ≥ 1 * By minimizing “local divergence”: Q(X1, X2, …, Xn) = f(X1) f(X2) … f(Xn)

Local vs. Global Minimization Source: T. Minka. Divergence measures and message passing. Technical Report MSR-TR-2005-173, Microsoft. Research, 2005.

Applications

Sensor Localization C B A

Protein Side Chain Placement RTDCYGN +

Common traits? ? Continuous state space:

Easy Solution: Discretize! Domain size: d = 400 Domain size: d = 100 20 bins 10 bins Each message: O(d2) 20 bins 10 bins

Particle BP We’d like to pass “continuous messages”… mAB(B) A B B D Instead, pass discrete messages over sets of particles: C { b(i)} ~ WB(B) . . . b(1) b(2) b(N) mAB({b(i)})

PBP: Computing the Messages Re-write as an expectation: Finite-sample approximation:

Choosing“Good” Proposals A B D Proposal should “match” the integrand. C Sample from the belief:

Iteratively Refine Particle Sets (2) f(xs, xt) (1) (3) (1) (3) Xs Xt • Draw a set of particles, {xs(i)} ~ Ws(xs). • Discrete inference over the particle discretization. • Adjust Ws(xs)

Benefits of PBP • No distributional assumptions. • Easy accuracy/speed trade-off. • Relies on an “embedded” discrete algorithm. Belief propagation, mean field, tree-reweighted BP…

Exploring PBP: A Simple Example xs ||xs – xt||

Continuous Ising Model Marginals Approximate * Run with 100 particles per node Exact Mean Field PBP α = 0 PBP α = 1 TRW PBP α = 1.5

A Localization Scenario

Exact Marginal

PBP Marginal

Tree-reweighted PBP Marginal

Estimating the Partition Function • Mean field provides a lower bound. • Tree-reweighted BP provides an upper bound. p(A,B,C,D) = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D) Z = f(A) f(B) f(C) f(D) f(A,B) f(B,C) f(B,D)

Partition Function Bounds

Conclusions • BP and related algorithms are useful! • Particle BP let’s you handle continuous RVs. • Extensions to BP can work with PBP, too. Thank You!

Belief Propagation in a Continuous World