300 likes | 438 Views
Multiple-Try Metropolis. Jun Liu Department of Statistics Stanford University. Based on the joint work with F. Liang and W.H. Wong. The Basic Problems of Monte Carlo. Draw random variable Estimate the integral . Sometimes with unknown normalizing constant. c g( x ). c. u cg ( x ).
E N D
Multiple-Try Metropolis Jun Liu Department of Statistics Stanford University Based on the joint work with F. Liang and W.H. Wong. MCMC and Statistics
The Basic Problems of Monte Carlo • Draw random variable • Estimate the integral Sometimes with unknown normalizing constant MCMC and Statistics
c g(x) c u cg(x) p(x) x How to Sample from p(x) • The Inversion Method.If U ~ Unif (0,1) then • The Rejection Method. • Generate x from g(x); • Draw u from unif(0,1); • Accept x if • The accepted x follows p(x). The “envelope” distrn MCMC and Statistics
a High Dimensional Problems? Ising Model Partition function Metropolis Algorithm: (a) pick a lattice point, say a, at random (b) change current xa to 1- xa (so X(t) ® X*) (c) compute r= p(X*)/ p(X(t) ) (d) make the acceptance/rejection decision. MCMC and Statistics
General Metropolis-Hastings Recipe • Start with any X(0)=x0, and a “proposal chain” T(x,y) • Suppose X(t)=xt . At time t+1, • Draw y~T(xt ,y) (i.e., propose a move for the next step) • Compute the Metropolis ratio (or “goodness” ratio) • Acceptance/Rejection decision: Let “Thinning down” MCMC and Statistics
Why Does It Work? • The detailed balance Actual transition probability from x to y, where Transition probability from y to x. MCMC and Statistics
General Markov Chain Simulation • Question: how to simulate from a target distribution p(X) via Markov chain? • Key: find a transition function A(X,Y) so that f0 An ® p that is, p is an invariant distribution of A. • Different from traditional Markov Chain theory. MCMC and Statistics
Generally If the actual transition probability is I learnt it from Stein where (x,y) is a symmetric function of x,y, Then the chain has (x) as its invariant distribution. MCMC and Statistics
Problems? • The moves are very “local” • Tend to be trapped in a local mode. MCMC and Statistics
Iteration t xa xc Other Approaches? • Gibbs sampler/Heat Bath:better or worse? • Random directional search --- should be better if we can do it. “Hit-and-run.” • Adaptive directional sampling (ADS) (Gilks, Roberts and George, 1994). Multiple chains MCMC and Statistics
A chosen direction Gibbs Sampler/Heat Bath • Define a “neighborhood” structure N(x) • can be a line, a subspace, trace of a group, etc. • Sample from the conditional distribution. • Conditional Move MCMC and Statistics
How to sample along a line? • What is the correct conditional distribution? • Random direction: • Directions chosen a priori: the same as above • In ADS? MCMC and Statistics
The Snooker Theorem • Suppose x~and y is any point in the d-dim space. Let r=(x-y)/|x-y|.If t is drawn from Then follows the target distribution . If y is generated from distr’n, the new point x’ is indep. of y. x y (anchor) MCMC and Statistics
Connection with transformation group • WLOG, we let y=0. • The move is now:x x’=tx The set {t: t0} forms a transformation group. Liu and Wu (1999) show that if t is drawn from Then the move is invariant with respect to . MCMC and Statistics
Another Hurdle • How to draw from something like • Adaptive rejection? Approximation? Griddy Gibbs? • M-H Independence Sampler(Hastings, 1970) • need to draw from something that is close enough to p(x). MCMC and Statistics
Ideas • Propose bigger jumps • may be rejected too often • Proposal with mix-sized stepsizes. • Try multiple times and select good one(s) (“bridging effect”) (Frankel & Smit, 1996) • Is it still a valid MCMC algorithm? MCMC and Statistics
Multiple-Try Metropolis Current is at x Can be dependent ones • Draw y1,…,yk from the proposal T(x, y) . • Select Y=yjwith probability (yj)T(yj,x). • Draw from T(Y, x). Let • Accept the proposed yj with probability MCMC and Statistics
A Modification • If T(x,y) is symmetric, we can have a different rejection probability: Ref: Frankel and Smit (1996) MCMC and Statistics
Back to the example Random Ray Monte Carlo: y3 • Propose random direction • Pick y from y1 ,…, y5 • Correct for the MTM bias y5 y4 x y2 y1 MCMC and Statistics
y2 y4 y6 y8 y1 y3 y5 y7 An Interesting Twist x • One can choose multiple tries semi-deterministically. Random equal grids y • Pick y from y1 ,…, y8 • The correction rule is the same: MCMC and Statistics
Use Local Optimization in MCMC • The ADS formulation is powerful, but its direction is too “random.” • How to make use of their framework? • Population of samples • Randomly select to be updated. • Use the rest to determine an “anchor point” • Here we can use local optimization techniques; • Use MTM to draw sample along the line, with the help of the Snooker Theorem. MCMC and Statistics
Distribution contour xc xa (anchor point) A gradient or conjugate gradient direction. MCMC and Statistics
Numerical Examples • An easy multimodal problem MCMC and Statistics
A More DifficultTest Example • Mixture of 2 Gaussians: • MTM with CG can sample the distribution. • The Random-Ray also worked well. • The standard Metropolis cannot get across. MCMC and Statistics
Fitting a Mixture model • Likelihood: • Prior: uniform in all, but with constraints And each group has at least one data point. MCMC and Statistics
y Bayesian Neural Network Training Nonlinear curve fitting: • Setting: Data = • 1-hidden layer feed-forward NN Model • Objective function for optimization: MCMC and Statistics
Liang and Wong (1999) proposed a method that combines the snooker theorem, MTM, exchange MC, and genetic algorithm. Activation function: tanh(z) # hidden units M=2 MCMC and Statistics