190 likes | 311 Views
MCMC (Part II). By Marc Sobel. Monte Carlo Exploration. Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative constant of proportionality. Newton-Raphson says that we can pick a point nearer a mode by using the transformation:.
E N D
MCMC (Part II) By Marc Sobel
Monte Carlo Exploration • Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative constant of proportionality. Newton-Raphson says that we can pick a point nearer a mode by using the transformation:
Langevin Algorithms • Monte Carlo demands that we explore the distribution rather than simply moving toward a mode. Therefore, we can introduce a noise factor via: • (Note that we have replaced ‘ε’ by σ. We can just use it as is or combine it with a Metropolis Hastings step:
Langevin Algorithm with Metropolis Hastings • The move probability is:
Extending the Langevin to a Hybrid Monte Carlo Algorithm • Instead of moving based entirely on the gradient (with noise added on) we could add • ‘kinetic energy’ via: • Iterate this algorithm.
Matlab Code for Hybrid MC: A total of Tau steps along the constant energy path • g=gradient(x); (set gradient) • E=log(f(x)); (set energy) • For i=1:L • P=randnorm(size(x)); • H=p’*p/2 + E; • gnew=G; xnew=x; • for tau=1:Tau • p=p-epsilon*gnew/2; (make half step in p) • xnew=xnew+epsilon*p (make an x step) • gnew=gradient(xnew); (update gradient) • p=p-epsilon*gnew/2; (make another half step in p) • end • Enew=log(f(xnew)); (find anew value) • Hnew=p’*p/2+Enew; (find new H) • dH=Hnew-H; • if(rand<exp(-dH)) Accept=1; else Accept=0; end • if(Accept==1) H=Hnew; end • end
Example • Log(f(x))= x2+a2-log(cosh(ax)); k(p)=p2;
Project • Use Hybrid MC to sample from a multimodal multivariate density. Does it improve simulation?
Monte Carlo Optimization: Feedback, random updates, and maximization • Can monte Carlo help us search for the optimum value of a function. We’ve already talked about simulated annealing. There are other methods as well.
Random Updates to get to the optimum • Suppose we return to the problem of finding modes: Let ζ denote a uniform random variable on the unit sphere, and αx, βx are determined by numerical analytic considerations (see Duflo 1998). (We don’t get stuck using this).
Optimization of a function depending on the data • Minimize the (two-way) KLD between a density q(x) and a Gaussian mixture • f=∑αiφ(x-θi) using samples. The two way KLD is: • We can minimize this by first sampling X1,…,Xn from q, and then sampling Y1,…,Yn from s0(x) (assuming it contains the support of the f’s) and minimizing
Example (two-way) KLD • Monte Carlo rules dictate that we can’t sample from a distribution which depends on the parameters we want to optimize. Hence we importance sample the second KLD equation using s0. We also employ an EM type step involving latent variables Z:
Prior Research • We (Dr Latecki, Dr. Lakaemper and I) minimized the one way KLD between a nonparametric density q and a gaussian mixture. (paper pending) • But note that for mixture models which put large weight on places where the NPD is not well-supported, minimizing may not give you the best possible result.
Project • Use this formulation to minimize the KLD distance between q (e.g., a nonparametric density based on a data set) and a gaussian mixture.
General Theorem in Monte Carlo Optimization • One way of finding an optimal value for a function f(θ), defined on a closed bounded set, is as follows: Define a distribution: • for a parameter λ which we let tend to infinity. If we then simulate θ1,…,θn ≈ h(θ), then
Monte Carlo Optimization Observe (X1,…,Xn|θ)≈ L(X|θ): Simulate, θ1,…,θn from the prior distribution π(θ). Define the posterior (up to a constant of proportionality) by, l(θ|X). It follows that, converges to the MLE. Proof uses laplace approximation (see Robert (1993)).
Exponential Family Example • Let X~exp{λθx-λψ(θ)}, and θ~π
Possible Example • It is known that calculating maximum likelihood estimators for the parameters in a k-component mixture model are hard to compute. If, instead maximizing the likelihood, we treat the mixture as a Bayesian model together with a scale parameter λ and an indifference prior, we can (typically) use Gibbs sampling to sample from this model. Letting λ tend to infinity leads to our being able to construct MLE’s.
Project • Implement an algorithm to find the MLE for a simple 3 component mixture model. (Use Robert (1993)).