210 likes | 308 Views
Direct Approach to Cluster Variation Method in Graphs with Discrete Nodes. Michal Rosen-Zvi Computer Science Division, UC Berkeley Michael I. Jordan Computer Science Division and the Statistics Department, UC Berkeley. Outline. Introduction The exponential family distributions
E N D
Direct Approach to Cluster Variation Method in Graphs with Discrete Nodes Michal Rosen-Zvi Computer Science Division, UC Berkeley Michael I. Jordan Computer Science Division and the Statistics Department, UC Berkeley
Outline • Introduction • The exponential family distributions • Variational approximation • Gibbs sampling • Undirected graphs • New approach to Gibbs sampling • Belief propagation revisited • The FNA • Directed graphs if time allows
Estimating marginals of discrete random variables P(x|Q)=exp[QT f(x)-A(Q)] The features In a quadratic model: fij=xixj f i=xi Log partition function The exponential family form
Some definitions Marginalizing over the mth node mm=Sxi\xmP(x| ) x set of random variables • set of parameters mmn=Sxi\{xm xn}P(x| ) xi={0,1} binary
What set is this??? Variational presentation of the log-partition function A(Q)=ln Sxexp[QT f(x)] It is a convex function (A*)*= A A*(m)=supqR|I|{mTq- A(Q)} A(Q)=supmM{mTq- A*(m)}
Dual parameters = marginals m=Eq[f(x)] For discrete random variables, M, is the marginal ploytope defined by M:={m R|I| | p(·) s.t. Sxif(x)p(x)=m} CV Approximations: pseudo marginals, m
Mean field Factorizing the joint probability distribution P(x|m)=Pi P(xi |mi)= Pi [mi xi (1- mi )1-xi ] Only one Lagrange multiplier for each node but found by iterative algorithm – the numeric results might not be in the approx. M The objective function is not concave Pseudomarginals set is convex lies withinM
Mean field (cont.) P(x|m)=Pi P(xi |mi)= Pi [mi xi (1- mi )1-xi ] Appr. the canonical parm. + Padding with zeros A*(m)=supqR|I|{mTq- Ai(qi)} For pairwise and single nodes iter. qi = qi + qijmi/2 A(Q)=supmM{mTq- A*(m)} A(Q)=supmM{miqi+mimjqij - milnmi - (1-mi)ln(1-mi)} The objective function is not concave Pseudomarginals set is convex lies within M
Gibbs Sampling • Local updates according to the conditional probability, p(xi=1)= xN(i)p(xN(i))(jN(i)ijxj) (y)=exp(y)/[1+exp(y)] • The measure converges to the Gibbs distribution – the exponential form • All moments are calculated using samples from the equilibrium
p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xi=1,xj=1)=… p(xi=1,xj=1,xk=1)=… Gibbs Sampling – dual space pt+1(xi=1)=(1-1/N)pt(xi=1)+ 1/NxN(i)pt(xN(i))(jN(i)ijxj) A set of 2N fixed point equations yields exact relations between marginals
Gibbs Sampling and Bethe app. p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xN(i))= ~xijN(i)p(xixj)/p(xi)|N(i)|-1 mi=xN(i) xi jN(I)p(xixj)/p(xi)|N(i)|-1(jN(i)ijxj) mij=f(mi, mij’ij’) j’ stands for all neighbors of i and j.
Gibbs Sampling and the Factorized Neighbors Algorithm p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xN(i))= ~jN(i)p(xj) The FNA: mi= xN(i)jN(i)p(xi)(jN(i)ijxj)
The F. N. A. • The approximation is less restricted than MF • The algorithm is not exact on trees • The approximation is more restricted than Bethe Some comparisons for graphs with N nodes M edges and up to n neighbors. Time complexity: MF: O(N) Bethe: O(M) FNA: O(Nexp(n)) Space complexity: MF: O(N) Bethe: O(M) FNA: O(N)
Directed Gibbs sampling and the parents factored app. • As soon as a node is chosen all its descendents are updated • The local updates are according to the parents’ current state. • Factorized parents assumption p(x(i))=j(i) p(xj) p(xi=1)=x(i)j(i) p(xj)(j(i)ijxj)
Directed Gibbs Sampling – dual space pt+1(xi=1)=(1-i)pt(xi=1)+ x(i)[1/Npt(x(i))+ (1/N-i) pt+1(x (i))](j(i)ijxj) p(xi=1)=x(i)p(x(i))(j(i)ijxj) p(xi=1,xj=1)=x(i)\j, x(j)p(x(i)\j, x(j)) (k(j)jkxk) (k(i)\jikxk+ ij)
Back to the CVM approach P(x|m)=Pi P(xi,x(i)|mi, (i))/… Padding with zeros to a higher space A*(m)=entropy of some approx. canonical set For pairwise and single nodes iterations: qi = qi qij =qij qi, (i) =0 A(Q)=supmM{mTq- A*(m)} The objective function is concave Pseudomarginals set is not necessarily within M
Numerical results parents fact. Evidence: x17=x18=x19=1 FPA makes use of the exact results in the evidence-free graph