230 likes | 357 Views
Genome evolution: a sequence-centric approach. Lecture 6: Belief propagation. (Probability, Calculus/Matrix theory, some graph theory, some statistics). Simple Tree Models HMMs and variants PhyloHMM,DBN Context-aware MM Factor Graphs. Probabilistic models. Genome structure. Inference.
E N D
Genome evolution: a sequence-centric approach Lecture 6: Belief propagation
(Probability, Calculus/Matrix theory, some graph theory, some statistics) Simple Tree Models HMMs and variants PhyloHMM,DBN Context-aware MM Factor Graphs Probabilistic models Genome structure Inference Mutations DP Sampling Variational apx. Parameter estimation Population EM Generalized EM (optimize free energy) Inferring Selection Refs: HMM,simple tree: Durbin Basic BNs: Heckerman Sampling: Mackey book Variational: Jojic et al. paper LBP: Yedidia,Freeman, Weiss
You are P(H|our data) I am P(H|all data) You are P(H|our data) Simple Tree: Inference as message passing s s s s s s s DATA
Belief propagation in a factor graph • Remember, a factor graph is defined given a set of random variables (use indices i,j,k.) and a set of factors on groups of variables (use indices a,b..) • xa refers to an assignment of values to the inputs of the factor a • Z is the partition function (which is hard to compute) • The BP algorithm is constructed by computing and updating messages: • Messages from factors to variables: (any value attainable by xi)->real values • Messages from variables to factors: • Think of messages as transmitting beliefs: • a->i : given my other inputs variables, and ignoring your message, you are x • i->a : given my other inputs factors and my potential, and ignoring your message, you are x
Messages update rules: • Messages from variables to factors: i a • Messages from factors to variables: i a
Why this is different than the mean field algorithm? The algorithm proceeds by updating messages: • Define the beliefs as approximating single variables posterios (p(hi|s)): Algorithm: Initialize all messages to uniform Iterate until no message change: Update factors to variables messages Update variables to factors messages
The update rules can be viewed as derived from the: • 1.requirement on the variables beliefs (bi) • 2.requirement on the factor beliefs (ba) • 3.Marginalization requirement: • Here’s how: Beliefs on factor inputs • This is far from mean field, since for example
3 2 1 BP on Tree = Up-Down d h3 c e h1 h2 b a s2 s1 s4 s3
Loopy BP is not guaranteed to converge 1 1 X Y 0 0 This is not a hypothetical scenario – it frequently happens when there is too much symmetry For example, most mutational effects are double stranded and so symmetric which can result in loops.
The Bethe Free Energy • LBP was introduced in several domains (BNs, Coding), and is consider very practical in many cases. • ..but unlike the variational approaches we studied before, it is not clear how it approximate the likelihood/partition function, even when it converges.. • In the early 2000, Yedidia, Freeman and Weiss discovered a connection between the LBP algorithm and the Bethe free energy developed by Hans Bethe to approximate the free energy in crystal field theory back in the 40’s/50’s. H. Bethe Theorem: beliefs are LBP fixed points if and only if they are locally optimal for the Bethe free energy • Compare to the variational free energy:
Generalization: Regions-based free energy • Start with a factor graph (X,A) • Introduce regions (XR,AR) and multipliers cR • We require that: • We will work with valid regions graphs: Region average energy Region Entropy Region Free energy Region-based average energy Region-based entropy Region-based free energy
Bethe regions are the factors neighbors set and single variables regions: c a b We compensate for the multiple counting of variables using the multiplicity constant We can add larger regions Rac As long as we update the multipliers: Ra Rbc
Multipliers compensate on average, not on entropy Claim: If the regions’ beliefs are exact then the average region-based energy is exact. We cannot guarantee much on the region-based entropy: Claim: the region-based entropy is exact when the model is a uniform distribution Proof: exercise. This means that the entropy count the correct number of degrees of freedom – e.g. for binary variables, H=Nlog2 Definition: a region based free energy approximation is said to be max-ent normal if its region-based entropy is maximized when the beliefs are uniform. An non max-ent approximation can minimize the region free energy by selecting erroneously high entropy beliefs!
Bethe’s region are max-ent normal Claim: The Bethe regions gives a max-ent normal approximation (i.e. it maximize the region-based entropy on the uniform distribution) Entropy Information (maximal on uniform) (0 and minimal on uniform)
Example: A Non max-ent approximation Start with a complete graph and binary factors Add all variable triplets, pairs and singleton as regions Generate multipliers: triplets = 1 (20 overall) pairs = -3 (15 overall) singletons = 6 (6 overall) ( guarantee consistency) Look at the consistent beliefs: The Region entropy (for any region) = ln2. The total region entropy is: We claimed before the entropy of the uniform distribution will be exact: 6ln2
Inference as minimization of region-based free energy We basically solve a variational problem: While enforcing constraints on the regions’ beliefs: Unlike the structured variational approximation we discussed before, and although the beliefs are (pairwise) compatible, we can have cases with locally optimal beliefs that are not representing a true global posterior distribution Optimal region beliefs are identical to the factors: A B This is pairwise consistent, but cannot be the result of any joint distribution on the three vars (we have a negative feedback loop here) C
Inference as minimization of region-based free energy Claim: When it converges, LBP finds a minimum of the Bethe free energy. Proof idea: we have an optimization problem (minimum energy) with constraints (beliefs are consistent and adds up to 1). We write down a Lagrangian that expresses both minimization goal and constraints, and show that it is minimized when the LBP update rules are holding. Important technical point: we shall assume that in the fixed point all beliefs are non zero. This can be shown to hold if all factors are “soft” (do not contain zero values for any assignment).
The Bethe Lagrangian Large region beliefs are normalized Variable region beliefs are normalized Marginalization
The Bethe lagrangian Take the derivatives with respect to each ba and bi:
Bethe minimum are LBP fixed points So here are the conditions: And we can solve them if: Giving us: We saw before these conditions, with the marginalization constraint, are generating the update rules! So L minimum -> LBP fixed point is proven. The other direction quite direct – see Exercise LBP is in fact computing the lagrange multipliers – a very powerful observation
Generalizing LBP for region graphs A region graph is graph on subsets of nodes in the factor graph, with valid multipliers (as defined above) • regions (XR,AR) and multipliers cR • We require that: • We will work with valid regions graphs: P(R) D(R) – Decedents of R R P(D(R))\D(R) P(R) – Parents of R D(R) Parent-to-child beliefs:
Generalizing LBP for region graphs Parent-to-child algorithm: D(R) – Decedents of R P(R) – Parents of R Not D(P)+P I D(P)+P P N(I,J) = I not in D(P)+P J in D(P)+P but not D(R)+R R J D(R)+R D(P)+P P D(I,J) = I in D(P)+P but not D(R)+R J in D(R)+R R J I D(R)+R
GLBP in practice LBP is very attractive for users: really simple to implement, very fast LBP performance is limited by the size of region assignments Xa which can grow rapidly with the factor’s degrees or the size of large regions GLBP will be powerful when large regions can capture significant dependencies that are not captured by individual factors – think small positive loop or other symmetric effects LBP messages can be computed synchronously (factors->variables->factors…), other scheduling options may boost up performance considerably LBP is just one (quite indirect) way by which Bethe energies can be minimized. Other approaches are possible – which can be guaranteed to converge The Bethe/Region energy minimization can be further constraint to force beliefs are realizable. This gives rise to the concept of Wainwright-Jordan marginal polytope and convex algorithms on it.