MCMC (Part II)

MCMC (Part II) By Marc Sobel

Monte Carlo Exploration • Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative constant of proportionality. Newton-Raphson says that we can pick a point nearer a mode by using the transformation:

Langevin Algorithms • Monte Carlo demands that we explore the distribution rather than simply moving toward a mode. Therefore, we can introduce a noise factor via: • (Note that we have replaced ‘ε’ by σ. We can just use it as is or combine it with a Metropolis Hastings step:

Langevin Algorithm with Metropolis Hastings • The move probability is:

Extending the Langevin to a Hybrid Monte Carlo Algorithm • Instead of moving based entirely on the gradient (with noise added on) we could add • ‘kinetic energy’ via: • Iterate this algorithm.

Matlab Code for Hybrid MC: A total of Tau steps along the constant energy path • g=gradient(x); (set gradient) • E=log(f(x)); (set energy) • For i=1:L • P=randnorm(size(x)); • H=p’*p/2 + E; • gnew=G; xnew=x; • for tau=1:Tau • p=p-epsilon*gnew/2; (make half step in p) • xnew=xnew+epsilon*p (make an x step) • gnew=gradient(xnew); (update gradient) • p=p-epsilon*gnew/2; (make another half step in p) • end • Enew=log(f(xnew)); (find anew value) • Hnew=p’*p/2+Enew; (find new H) • dH=Hnew-H; • if(rand<exp(-dH)) Accept=1; else Accept=0; end • if(Accept==1) H=Hnew; end • end

Example • Log(f(x))= x2+a2-log(cosh(ax)); k(p)=p2;

Project • Use Hybrid MC to sample from a multimodal multivariate density. Does it improve simulation?

Monte Carlo Optimization: Feedback, random updates, and maximization • Can monte Carlo help us search for the optimum value of a function. We’ve already talked about simulated annealing. There are other methods as well.

Random Updates to get to the optimum • Suppose we return to the problem of finding modes: Let ζ denote a uniform random variable on the unit sphere, and αx, βx are determined by numerical analytic considerations (see Duflo 1998). (We don’t get stuck using this).

Optimization of a function depending on the data • Minimize the (two-way) KLD between a density q(x) and a Gaussian mixture • f=∑αiφ(x-θi) using samples. The two way KLD is: • We can minimize this by first sampling X1,…,Xn from q, and then sampling Y1,…,Yn from s0(x) (assuming it contains the support of the f’s) and minimizing

Example (two-way) KLD • Monte Carlo rules dictate that we can’t sample from a distribution which depends on the parameters we want to optimize. Hence we importance sample the second KLD equation using s0. We also employ an EM type step involving latent variables Z:

Prior Research • We (Dr Latecki, Dr. Lakaemper and I) minimized the one way KLD between a nonparametric density q and a gaussian mixture. (paper pending) • But note that for mixture models which put large weight on places where the NPD is not well-supported, minimizing may not give you the best possible result.

Project • Use this formulation to minimize the KLD distance between q (e.g., a nonparametric density based on a data set) and a gaussian mixture.

General Theorem in Monte Carlo Optimization • One way of finding an optimal value for a function f(θ), defined on a closed bounded set, is as follows: Define a distribution: • for a parameter λ which we let tend to infinity. If we then simulate θ1,…,θn ≈ h(θ), then

Monte Carlo Optimization Observe (X1,…,Xn|θ)≈ L(X|θ): Simulate, θ1,…,θn from the prior distribution π(θ). Define the posterior (up to a constant of proportionality) by, l(θ|X). It follows that, converges to the MLE. Proof uses laplace approximation (see Robert (1993)).

Exponential Family Example • Let X~exp{λθx-λψ(θ)}, and θ~π

Possible Example • It is known that calculating maximum likelihood estimators for the parameters in a k-component mixture model are hard to compute. If, instead maximizing the likelihood, we treat the mixture as a Bayesian model together with a scale parameter λ and an indifference prior, we can (typically) use Gibbs sampling to sample from this model. Letting λ tend to infinity leads to our being able to construct MLE’s.

Project • Implement an algorithm to find the MLE for a simple 3 component mixture model. (Use Robert (1993)).

MCMC (Part II)

MCMC (Part II)

Presentation Transcript

Part II

MCMC Estimation

Bayesian Integration (MCMC II)

MCMC

Part II

Part II

Part II

PART II

Part II

Part II

Connections between MCMC and Likelihood Methods

«MCMC Conference»

Part II

Part II

Using MCMC

Part II

Part II

Part II

PART II

PART II - II

Part II

Part II