250 likes | 332 Views
Spatial Bayesian Variable Selection with Application to fMRI. Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department of Statistics, LMU. Key References. Smith and Fahrmeir (2004) ‘Spatial Bayesian variable selection with application to fMRI’ (under review)
E N D
Spatial Bayesian Variable Selection with Application to fMRI Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department of Statistics, LMU
Key References • Smith and Fahrmeir (2004) ‘Spatial Bayesian variable selection with application to fMRI’ (under review) • Smith, Putz, Auer and Fahrmeir, (2003), Neuroimage • Smith and Smith (2005), ‘Estimation of binary Markov Random Fields using Markov chain Monte Carlo’, to appear in JCGS • Kohn, Smith and Chan (2001), ‘Nonparametric regression using linear combinations of basis functions’, Statistics and Computing, 11, 313-322
Bayesian Variable Selection in Regression • Bayesian variable selection is now widely used in statistics • Assume there are i=1,..,N separate regressions: yi = Xiβi + ei each located at sites on a lattice • Each with ni observations and the same p independent variables • For each regression, introduce a vector of indicator variables γi with elements γij = 0 iff βij = 0 γij = 1 iff βij ≠ 0 where j=1,…,p
Proper Prior for Coefficients • Each regression can be restated as yi = Xi(γi)βi(γi)+ ei • where βi(γi) are the non-zero elements of the βi vector. • A proper prior is employed for the non-zero elements of the β vector with • This results in a prior (g-prior)
Joint Model Posterior • The posterior of interest is • where (see Smith and Kohn ‘96) • Binary MRFs priors are placed on p(γ(j)) to spatially smooth the indicators for each regressor j
The Ising Prior • One of the most popular binary MRFs is due to Ernst Ising • Let γ = ( γ1,…, γN)’ be a vector of binary indicators over a lattice • Then the pdf of γis: • Here: • αiare the external field coefficients • i~j indicates site i neighbors site j • θ is the smoothing parameter • ωij are the weights (taken here as reciprocal of distance between sites i,j)
The Ising Prior • Other binary MRF priors could be used • For example, latent variables with Gaussian MRFs (see Smith and Smith ’05 for a comparison) • However, the Ising model has three strong advantages in the fMRI application: • Through the external field, anatomical prior information can be incorporated • Single-site sampling is much faster than with the latent GMRF based priors • The edge-preservation properties are strong
The Ising Prior • However, the joint distribution p(γ,θ)=p(γ|θ)p(θ) can be tricky…. • When α=0 p(γi=1 | θ) = p(γi=1) = ½, so that θ is a smoothing parameter only • However, whenα≠0 (which proves important in the fMRI application) then p(γi=1 | θ) ≠ p(γi=1), so that θ is both a smoothing parameter and (along with fixed variables α) determines the marginal prior probability
Posterior Distribution • The posterior distribution (conditional on θ) can be computed in closed form, with • However, in general this is not a recognisable density, so that inference has to be undertaken via simulation • Possible to employ a MH step using a binary MRF approximation as a proposal (see Nott & Green) • However, single-site sampling proves very hard to beat!
Sampling Scheme • Can use the Gibbs sampler, generating directly from p(γij|γ\ij,θ,y) • Alternatively, can employ a MH step at each site based on the conditional prior p(γij|γ\ij,θ) • Even under uniform priors can prove significantly faster (see Kohn et al. ‘01) • This is because it avoids the need for evaluation of the likelihood when there is no switch (that is, 0→0 or 1→1) • In the case where informative spatial binary MRF priors are used, the MH step can prove even faster because the proposal is a better approximation to the posterior
Co-estimation with Smoothing Parameters • Assume a uniform hyperprior • The smoothing parameters can be generated one-at-a-time from • Where Cj(αj,θj) is the normalising constant for the Ising density • It is pre-computed using a B-spline numerical approximation, with points evaluated using a Monte Carlo method (see Green and Richardson, ‘02) • This proves extremely accurate • The element θj is then generated using a RW MH
Monte Carlo Estimates • The following Monte Carlo mixture estimates are of interest • Note that if primary interest is in the first estimate, then evaluation of the conditional posteriors is undertaken analytically at each step of a Gibbs sampler • In this case, this negates the speed improvements of the MH step suggested in Kohn et al. ’01 • If interest is in the second estimate, then the faster sampler is preferable
Introduction to brain mapping using fMRI data • fMRI is a powerful tool for the non-invasive assessment of the functionality of neuronal networks • The crux of this analysis is the distinction between active and inactive voxels from a functional time series • This data is typically massive and involves a time series of observations at many voxels in the brain • The data is usually highly noisy, and the issue facing statisticians is the balancing of false-positive with false-negative classifications • This issue is addressed by exploiting the known high degree of spatial correlation in activation profiles • If exploited effectively, this greatly enhances the activation maps and reduces both f-p and f-n classifications
Example of MR time series (strongly active, weakly active, inactive voxels)
Regression Modeling of fMRI data • Define the following variables: • yit: the magnetic resonance time series at voxel i and time t • ait : baseline trend at voxel i and time t • zit: transformed stimulus at voxel i and time t • βi : activation amplitude • Then the regression model below is a popular model in the literature yit = ait + zitβi+ eit • The baseline trend is due to background influence on the patient’s neuronal activity during the experiment • A simple model is a parametric expansion ait = wt’αi , where we use a quadratic polynomial and 4 low order Fourier terms • The errors are modelled as iid N(0,σi2)
Anatomically Informed Activation Prior • The prior for the activation indicators (only) can be informed by all sorts of information • Here, we only consider the fact that activation can only occur in areas of grey matter • Let g=(g1,…,gN) denote whether, or not, each voxel is grey matter • We assume p(g)=p(g1)…p(gN), where p(gi) are known • Then, if p(γi =1 | gi) = a (0.1 in our empirical experiment), p(γi =1) = p(gi) a = ci • When θ=0 we can equate this with the marginal prior from the Ising density to obtain values for the external field to get p(γi =1) = exp(αi)/exp(αi+1) = ci
Two-Step Procedure • However, the drawback of using non-zero external field coefficients in the Ising prior for the activation effect is that θ is no longer simply a smoothing parameter • Therefore, we use a two-step procedure: Step (1) Set α=0, estimate and obtain a point estimate θ*=E(θ|y) Step (2) Refit with an anatomically informed Ising prior, but conditional on fixed level of smoothing θ* • This overcomes the complex inter-relationship between θ & the marginal probability of activation with non-zero α
Posterior Analysis using 3D Neighborhood • We fit the data using a 3D neighborhood (9+9+8 neighbors) • For simplicity here we do not undertake variable selection on the parametric trend terms, just the activation variable • Three priors are employed (see table 1) • The activation and amplitude maps obtained using prior (i) are in given in fig 2 • The activation maps using prior (ii) for 8 contiguous slices are given in fig 4- they reveal the important of using anatomical prior information • The activation mpas using prior (iii) for the same 8 slices are given in fig 6- they reveal the importance of spatial smoothing
Three Ising Priors • Prior (i) corresponds to step 1 in the two stage procedure • Prior (ii) corresponds to step 2 in the two stage procedure • Prior (iii) is for comparison with prior (ii) to demonstrate the impact of spatial smoothing
Some comparisons • An obvious alternative is to simply smooth spatially the activation amplitudes (βi, i=1,…,N) and then classify • Fig. 5 shows the amplitude map using this approach • What happens when variable selection is also undertaken on trend terms? • Therefore, using only a 2D neighborhood structure two estimates were obtained: one with BVS on all terms, and the other with BVS on only one term. • The resulting activation maps are similar and are in fig 8.