240 likes | 460 Views
A tutorial. on. Markov. MCMCMCMCMC. Chain. Monte Carlo. ò. I =. (x) dx. g. {X } form a Markov chain with stationary probability p. i. N. å. I. ¾. 1. ». ¾. g(x ). N. i. i=1. p. (x ). i. Problem. ( e.g. bayesian inference ). If.
E N D
A tutorial on Markov MCMCMCMCMC Chain Monte Carlo
ò I = (x) dx g {X } form a Markov chain with stationary probability p i N å I ¾ 1 » ¾ g(x ) N i i=1 p (x ) i Problem ( e.g. bayesian inference ) If
The problem of designing a Markov Chain with a pre-specified stationary distribution so that the integral I can be accurately approximated in a reasonable amount of time. X p X X X 2 , , . . . 1 N MCMC is then p Þ Þ
¾ p = min( 1, ) G(y|x) p p (x) (y) Metropolis et. al. (circ. 1953) p 1-P x y p’ y ‘ . . . . . .
Period = 2 .5 1 3 2 1 1 Reducible .5 2 1 3 ¾ min( 1, ) G(y|x) p p p (x) (y) (x) Theorem: Metropolis works for any proposal distribution G such that: G(y|x) = G(x|y) provided the MC is irreducible and aperiodic. Proof: Is symmetric in x and y. Note: it also works for general G provided we change p a bit (Hastings’ trick)
Gibbs sampler Take X=(W,Z) a vector. To sample X it is sufficient to sample cyclically from the conditionals (W|z), (Z|w). Gibbs is in fact a special case of Metropolis. Take proposals as the exact conditionals and G(y|x) = 1, i.e. always accept a proposed move. Let f(w,z) be the joint density with conditionals u(w|z) ,v(z|w) and marginals g(w), h(z). ò u(w|z) h(z) dz = g(w) ò v(z|w) g(w) dw = h(z) Û T(g,h) = (g,h) a fix point!
Entropic inference on gaussians Example: Likelihood p(dq) = exp(-a I(q:q’)) h(dq) Entropic prior Entropic posterior In terms of the suff. stats. and
The Conditionals gaussian | Generalized inverse gaussian |
Gibbs+Metropolis % init posterior log likelihood LL = ((t1-n2*mu).*mu-t3).*v + (n3-1)*log(v) - a2*((mu-m).^2+1./v); LL1s(1:Nchains,1) = LL; for t=1:burnin mu = normrnd((v*t1+a1m)./(n*v+a1),1./(n*v+a1)); v = do_metropolis(v,Nmet,n3,beta,a2); LL1s(1:Nchains,t+1) = ... ((t1-n2*mu).*mu-t3).*v + (n3-1)*log(v) - a2*((mu-m).^2+1./v); end function y = do_metropolis(v,Nmet,n3,t3,a2) % [Nchains,one] = size(v); x = v; accept = 0; reject = 0; lx = log(x); lfx = (n3-1)*lx-t3*x-a2./x; for t=1:Nmet y = gamrnd(n3,t3,Nchains,1); ly = log(y); lfy = (n3-1)*ly-t3*y-a2./y; for c=1:Nchains if (lfy(c) > lfx(c)) | (rand(1,1) < exp(lfy(c)-lfx(c))) x(c) = y(c); lx(c) = ly(c); lfx(c) = lfy(c); accept = accept+1; else reject = reject+1; end end end
Convergence: Are we there yet? Looks OK after second point.
Run several chains Start at over-dispersed points Monitor the log lik. Monitor the serial correlations Monitor acceptance ratios Re-parameterize (to get approx. indep.) Re-block (Gibbs) Collapse (int. over other pars.) Run with troubled pars. fixed at reasonable vals. The of simulation Art Monitor R-hat, Monitor mean of score functions, Monitor coalescence, use connections, become EXACT!
The future is Geometry
Easy paths: - geometric - mixture - scale q(q|w¢) t=1 q(q|w(t)) ¶ = å vk(t) ¶k (q,w(t)) q(q|w°) t=0 Exact rejection constants are known along the mixture path! Get Connected! Unnormalized posteriors: q(q|w) = Z(w) p(q|w) (e.g. w = w(x) = vector of suff. stats.) l = log(Z1/Z0) » (1/N) å ¶(qj,w(tj)) Where tj uniform on [0,1] and qj from p(q|w(tj)). l is the average tangent direction along the path. Choice of path is equivalent to choice of prior on [0,1]. Best (min. var.) prior (path) is generalized Jeffreys! Information geodesics are the best paths on manifold of unnormalized posteriors.
d Xn = Yn for all n, but as processes {Xn } ¹ {Yn } E.g. Let a<1. Take S (space of states) the real line, Q={+,-}, m(+)=m(-)=1/2 and f+(x) = a x + 1, f-(x) = a x - 1. Xn = a Xn-1 + en but Yn = a Yn-1 + e1 New Exact Math Most MCs are iterations of random functions: Let {fq :q ÎQ} family of functions. Choose n points, q1, q2, …,qn in Q independently with some p.m. m defined onQ. Moves all over S Forward iter.: X0 = x0, X1 = fq1 (x0), …, Xn+1 = fqn (Xn) = (fqn ... fq2 fq1)(x0) To a constant on S Backward iter.: Y0 = x0, Y1 = fqn(x0), …, Yn+1 = fq1 (Yn) = (fq1fq2 ...fqn)(x0) (corresp. frames have same distribution but the MOVIES are different )
Dead Leaves Simulation http://www.warwick.ac.uk/statsdept/Staff/WSK/ Forward Backward Looking down Looking up
Convergence of Yn+1 = (fq 1fq 2 ...fq n)(x0) when functions are contracting on average
.5 1 0 1 Perfectly equilibrated 2D Ising state at critical T = 529K 0 t = -M Gibbs with the same random numbers 1 t = 0 s £t Þ fq (s) £ fq (t) www.dbwilson.com Propp & Wilson Need backward iterations. First time to coalescence is not distributed as p. Here chain always coalesces at 0 first BUT p(0) = 2/3, p(1) = 1/3 .5
Exactly Are we there yet?
Not Exactly! Yet.