600 likes | 816 Views
§❺ Metropolis-Hastings sampling and general MCMC approaches for GLMM. Robert J. Tempelman. Genetic linkage example…again. Recall plant genetic linkage analysis problem or Suppose flat constant prior ( p( q ) 1) was used. Then. Suppose posterior density is not recognizable .
E N D
§❺ Metropolis-Hastings sampling and general MCMC approaches for GLMM Robert J. Tempelman
Genetic linkage example…again Recall plant genetic linkage analysis problem or Suppose flat constant prior (p(q)1) was used. Then
Suppose posterior density is not recognizable • Additionally, suppose there is no clear data augmentation strategy • Several solutions: • e.g. adaptive rejection sampling (not discussed here) • One recourse is to use the Metropolis-Hastings algorithm in which one generates from a candidate (or proposal)density function q(q', q'') in generating a MCMC chain of random variatesfrom. • q‘ : where you’re at nowat current MCMC cycle • q'': proposed value for next MCMC cycle
Metropolis Hastings • Say MCMC cycle is currently at value q[t-1] from cycle t-1. • Draw a proposed value q* from candidate density for cycle t. • Accept move from q[t-1] to q[t] = q* with probability: • Otherwise set q[t] = q* Good readable reference? Chib and Greenburg (1995)
How to compute this ratio “safely” • Always use logarithms whenever evaluating ratios!!! • Once you compute this…then backtransform
Back to plant genetics example Recall y1=1997, y2=906, y3=904, y4=32. Let’s use as the candidate generating function (based on likelihood approx.) 1.Determine a starting value (i.e. 0th cycle) q[o] 2.For t = 1, m (number of MCMC cycles) a) Generate q* from q(q[t-1], q*) = N(0.0357,3.6338 x 10-5) b) Generate U from a Uniform(0,1) distribution c) If U<a(q[t-1], q*) then set q[t]=q *, else set q[t]=q[t-1] • Note that this is an independence chains algorithm q(q[t-1], q*) = N(m =0.0357,s2 = 3.6338 x 10-5) q(q[t-1], q*) = q(q*)
Independence chains Metropolis • When candidate does not depend on q[t-1] • i.e. • However, in spite of this “independence” label, there is still serial autocorrelation between the samples. • IML code online. Generate output for 9000 draws after 1000 burn-in samples. Save every 10. q(q[t-1], q*) = q(q*)
Monitoring MH acceptance rates over cycles for genetic linkage example • Average MH acceptance rates (for every 10 cycles) Many acceptance rates close to 1! Is this good? NO Intermediate acceptance ratios (0.25-0.5) are optimal for MH mixing.
How to optimize Metropolis acceptance ratios • Recall q(q[t-1], q*) = N(m,s2) • m = 0.0357, s2=3.6338 x 10-5 • Suggest using q(q[t-1], q*) = N(m,cs2) and modify c (during burn-in) so that MH acceptance rates are intermediate • Increase c….decrease acceptance rates • Decrease c….increase acceptance rates.
“Tuning” the MH-sampler:My strategy • Every 10 MH cycles for first half of burnin, assess the following: • if average acceptance rate > .80, then set c= 1.2c, • if average acceptance rate < .20 then set c= 0.7c, • otherwise let c be. • SAS PROC MCMC has a somewhat different strategy. • Let’s rerun same PROC IML code again but with this modification.
Average acceptance ratio versus cycle(during 400 burn-in cycles) One should finish the tuning process not much later than half-ways through “burnin” c cycle
Monitoring MH acceptance rates over cycles • Average MH acceptance rates (every 10 cycles) post burn-in (16000 cycles)
Random walk Metropolis sampling • More common (especially when proposals based on likelihood function are not plausible) than independence chains Metropolis. • Proposal density is chosen to be symmetric in q* and q[t-1]. • i.e. q(q[t-1], q*) = q(q*, q[t-1]) • Example: generate a random variatedfrom N(0,cs2) and add it to the previous cycle value q[t-1] to generate q*= q[t-1] + d:same as sampling from
Random walk Metropolis (cont’d) • Because of symmetricity of q(q[t-1], q*) in q[t-1] and q*, MH acceptance ratio simplifies: • i.e., because
Back to example. • Start again with s2 = 0.00602 and a starting value for q[t-1] at t=1. • Generate proposed value from acceptwith probability • i.e., generate from N(0,cs2) and add to q[t-1] • Tune c for intermediate acceptance ratesduring burn-in.
What about “canned” software? • WinBugs • AD Model Builder • Various R packages (MCMCglmm) • SAS PROC MCMC • Will demonstrate shortly…functions a bit like PROC NLINMIXED (no class statement) • They all work fine. • But sometimes they don’t recognize conjugacy in priors • i.e., can’t distinguish between conjugate and non-conjugate (Metropolis) sampling. • So often defaults to Metropolis. (PROC MCMC: random walk Metropolis)
Recall old split plot in time example • Recall the “bunny” example from earlier. • We used PROC GLIMMIX and MCMC (SAS PROC IML) to analyze the data. • Our MCMC implementation involved recognizeable FCD • Split plot in time assumption. • Other alternatives? • Marginal versus conditional specifications on CS • AR(1) • Others? • Some FCD are not recognizeable • Metropolis updates necessary. • Let’s use SAS PROC MCMC.
First create the dummy variables using PROC TRANSREG (PROC MCMC does not have a “CLASS” statement)(Dataset called ‘recodedsplit’) Part of X matrix (full-rank) &_trgind
SAS PROC MCMC(“Conditional” specification) data null; call symputx(‘seed',8723); call symputx('nvar',12); run; Where to save the MCMC samples procmcmc data=recodedsplit outpost=ksu.postsplitpropcov=quanew seed = &seed nmc=400000thin=10 monitor = (beta1-beta&nvar sigmaesigmag); array covar[&nvar] intercept &_trgind; array beta[&nvar] ; parmssige1 ; * residual sd; parmssigg1 ; * random efsd; parms (beta1-beta&nvar) 1; prior beta:~normal(0,var=1e6); /* prior beta: ~ general(0); could also do this too */ prior sige ~ general(0,lower=0); /* Gelman prior */ prior sigg ~ general(0,lower=0); /* Gelman prior */ Metropolis implementation strategy Save how often? Total number of samples after burnin NBI = 1000 (default number of burn-in cycles) Fixed effects dummy variables Fixed effects Parms: starting values Priors: b ~ N(0,106) p(se) ~ constant; p(su) ~ constant
SAS PROC MCMC(conditional specification) beginnodata; sigmae= sige*sige; sigmau= sigg*sigg; endnodata; call mult(covar, beta, mu); random u ~ normal (0,var=sigmau) subject=trtrabbit ; model y ~ normal(mu + u,var=sigmae); run;
“Least-squares means”using output from PROC MCMC Cell means Marginal means Compare to Gibbs sampling results from § 85
Posterior densities of s2us2e Bounded above 0…by definition
The Marginal Model Specification (Type = CS) • SAS PROC MIXED CODE title "Marginal Model: Compound Symmetry using PROC MIXED"; procmixed data=ear ; class trt time rabbit; model temp = trt time trt*time /solution; repeated time /subject = rabbit(trt) type=csrcorr; lsmeanstrt*time; run;
Now • To ensure R is p.s.d, • nt: number of repeated measures per rabbit
Need to format data differently data=recodedsplit1
I’ll keep the covariates in a different file too. data=covariates
PROC MCMC data a; run; /* PROC MCMC WITH COMPOUND SYMMETRY ASSUMPTION */ title1 "Bayesian inference on compound symmetry "; procmcmcjointmodeldata=a outpost=ksu.postcspropcov=quanew seed = &seed nmc=400000 thin=10; array covar[1]/nosymbols ; array data[1]/nosymbols; array first1[1]/nosymbols; array last1[1]/nosymbols; array beta[&nvar] ; array mu[&nrec]; array ytemp[&nrep]; array mutemp[&nrep]; array VCV[&nrep,&nrep]; This data step is a little silly but it is required. jointmodeloption implies that each observation contribution to likelihood function is NOT conditionally independent.
begincnst; rc = read_array("recodedsplit1",data,"y"); rc = read_array("recodedsplit1",first1,"first"); rc = read_array("recodedsplit1",last1,"last"); rc = read_array("covariates",covar); endcnst; parmssige.25 ; * residual sd; parmsintrcl.3 ; * intraclass correlation; parms (beta1-beta&nvar) 1;
beginnodata; prior beta:~normal(0,var=1e6); prior sige ~ general(0, lower=0); /* Gelman prior */ prior intrcl ~ general(0,lower=&lbound1,upper=.999); sigmae = sige*sige; sigmag = intrcl*sigmae; call fillmatrix(VCV,sigmag); do i = 1 to &nrep; VCV[i,i] = sigmae; end; call mult(covar,beta,mu); endnodata; ljointpdf = 0; • &lbound1 = -1/3 (lower bound on CS correlation when blocksize = 4)
do irec = 1 to &nrec; if (first1[irec] = 1) then counter=0; counter = counter + 1; ytemp[counter] = data[irec]; mutemp[counter] = mu[irec]; if (last1[irec] = 1) then do; do; ljointpdf = ljointpdf + lpdfmvn(ytemp, mutemp, VCV); end; end; end; model general(ljointpdf); run;
PROC MIXED vs PROC MCMC PROC MIXED PROC MCMC
Posterior marginal densities for s2u and s2e under marginal model Notice how much of the posterior density of s2u is concentrated to the left of 0! Potential “ripple effect” on inferences on K’b? (Stroup and Littell., 2002) relative to conditional spec.?
First order autoregressive model (type = AR(1)) • SAS PROC MIXED CODE title "Marginal Model: AR(1)using PROC MIXED"; procmixed data=ear ; class trt time rabbit; model temp = trt time trt*time /solution; repeated time /subject = rabbit(trt) type= AR(1)rcorr; lsmeanstrt*time; run; CORRECTION!
Specifying VCV for AR(1) • Note • Might be easier to specify: Especially for large Rk(i) Example MCMC code provided online.
Variance Component Inference MCMC PROC MIXED
An example of a “sticky” situation • Consider a Poisson (count data) example. • Simulated data from a split plot design. • 4 whole plots per each of 3 levels of a whole plot factor. • 3 subplots per whole plot -> 3 levels of a subplot factor. • Whole plot variance: s2w = 0.50 • Overdispersion (G-side) variance: • B*wholeplot variance: s2e = 1.00
GLIMMIX code: procglimmix data=splitplot method=laplace; class A B wholeplot subject ; model y = A|B /dist=poisson solution ; random wholeplot(A) B*wholeplot(A); lsmeans A B A*B/e ilink; run;
Inferences on variance components: • PROC GLIMMIX
Using PROC MCMC procmcmc data=recodedsplit outpost=postoutpropcov=quanewseed = 9548nmc=400000 thin=10; array covar[&nvar] intercept &_trgind; array beta[&nvar] ; parmssigmau.5; parmssigmae.5; parms (beta1-beta&nvar) 1; prior beta: ~ normal(0,var=10E6); prior sigmae ~ igamma(shape=.1,scale=.1); prior sigmau ~ igamma(shape=.1,scale=.1); call mult(covar, beta, mu); random u~ normal (0,var=sigmau) subject=plot ; random e~ normal (0,var=sigmae) subject= subject; lambda = exp(mu + u + e); model y ~ poisson(lambda); run;
Some output In the same ball-park as the PROC GLIMMIX solutions/VC estimates…but there is a PROBLEM ->>>>>
beta1 sigmag sigmae beta2