1 / 25

Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir

Spatial Bayesian Variable Selection with Application to fMRI. Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department of Statistics, LMU. Key References. Smith and Fahrmeir (2004) ‘Spatial Bayesian variable selection with application to fMRI’ (under review)

duncan
Download Presentation

Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial Bayesian Variable Selection with Application to fMRI Michael Smith Econometrics & Business Statistics, U. of Sydney Ludwig Fahrmeir Department of Statistics, LMU

  2. Key References • Smith and Fahrmeir (2004) ‘Spatial Bayesian variable selection with application to fMRI’ (under review) • Smith, Putz, Auer and Fahrmeir, (2003), Neuroimage • Smith and Smith (2005), ‘Estimation of binary Markov Random Fields using Markov chain Monte Carlo’, to appear in JCGS • Kohn, Smith and Chan (2001), ‘Nonparametric regression using linear combinations of basis functions’, Statistics and Computing, 11, 313-322

  3. Bayesian Variable Selection in Regression • Bayesian variable selection is now widely used in statistics • Assume there are i=1,..,N separate regressions: yi = Xiβi + ei each located at sites on a lattice • Each with ni observations and the same p independent variables • For each regression, introduce a vector of indicator variables γi with elements γij = 0 iff βij = 0 γij = 1 iff βij ≠ 0 where j=1,…,p

  4. Proper Prior for Coefficients • Each regression can be restated as yi = Xi(γi)βi(γi)+ ei • where βi(γi) are the non-zero elements of the βi vector. • A proper prior is employed for the non-zero elements of the β vector with • This results in a prior (g-prior)

  5. Joint Model Posterior • The posterior of interest is • where (see Smith and Kohn ‘96) • Binary MRFs priors are placed on p(γ(j)) to spatially smooth the indicators for each regressor j

  6. The Ising Prior • One of the most popular binary MRFs is due to Ernst Ising • Let γ = ( γ1,…, γN)’ be a vector of binary indicators over a lattice • Then the pdf of γis: • Here: • αiare the external field coefficients • i~j indicates site i neighbors site j • θ is the smoothing parameter • ωij are the weights (taken here as reciprocal of distance between sites i,j)

  7. The Ising Prior • Other binary MRF priors could be used • For example, latent variables with Gaussian MRFs (see Smith and Smith ’05 for a comparison) • However, the Ising model has three strong advantages in the fMRI application: • Through the external field, anatomical prior information can be incorporated • Single-site sampling is much faster than with the latent GMRF based priors • The edge-preservation properties are strong

  8. The Ising Prior • However, the joint distribution p(γ,θ)=p(γ|θ)p(θ) can be tricky…. • When α=0 p(γi=1 | θ) = p(γi=1) = ½, so that θ is a smoothing parameter only • However, whenα≠0 (which proves important in the fMRI application) then p(γi=1 | θ) ≠ p(γi=1), so that θ is both a smoothing parameter and (along with fixed variables α) determines the marginal prior probability

  9. Posterior Distribution • The posterior distribution (conditional on θ) can be computed in closed form, with • However, in general this is not a recognisable density, so that inference has to be undertaken via simulation • Possible to employ a MH step using a binary MRF approximation as a proposal (see Nott & Green) • However, single-site sampling proves very hard to beat!

  10. Sampling Scheme • Can use the Gibbs sampler, generating directly from p(γij|γ\ij,θ,y) • Alternatively, can employ a MH step at each site based on the conditional prior p(γij|γ\ij,θ) • Even under uniform priors can prove significantly faster (see Kohn et al. ‘01) • This is because it avoids the need for evaluation of the likelihood when there is no switch (that is, 0→0 or 1→1) • In the case where informative spatial binary MRF priors are used, the MH step can prove even faster because the proposal is a better approximation to the posterior

  11. Co-estimation with Smoothing Parameters • Assume a uniform hyperprior • The smoothing parameters can be generated one-at-a-time from • Where Cj(αj,θj) is the normalising constant for the Ising density • It is pre-computed using a B-spline numerical approximation, with points evaluated using a Monte Carlo method (see Green and Richardson, ‘02) • This proves extremely accurate • The element θj is then generated using a RW MH

  12. Monte Carlo Estimates • The following Monte Carlo mixture estimates are of interest • Note that if primary interest is in the first estimate, then evaluation of the conditional posteriors is undertaken analytically at each step of a Gibbs sampler • In this case, this negates the speed improvements of the MH step suggested in Kohn et al. ’01 • If interest is in the second estimate, then the faster sampler is preferable

  13. Introduction to brain mapping using fMRI data • fMRI is a powerful tool for the non-invasive assessment of the functionality of neuronal networks • The crux of this analysis is the distinction between active and inactive voxels from a functional time series • This data is typically massive and involves a time series of observations at many voxels in the brain • The data is usually highly noisy, and the issue facing statisticians is the balancing of false-positive with false-negative classifications • This issue is addressed by exploiting the known high degree of spatial correlation in activation profiles • If exploited effectively, this greatly enhances the activation maps and reduces both f-p and f-n classifications

  14. Example of MR time series (strongly active, weakly active, inactive voxels)

  15. Regression Modeling of fMRI data • Define the following variables: • yit: the magnetic resonance time series at voxel i and time t • ait : baseline trend at voxel i and time t • zit: transformed stimulus at voxel i and time t • βi : activation amplitude • Then the regression model below is a popular model in the literature yit = ait + zitβi+ eit • The baseline trend is due to background influence on the patient’s neuronal activity during the experiment • A simple model is a parametric expansion ait = wt’αi , where we use a quadratic polynomial and 4 low order Fourier terms • The errors are modelled as iid N(0,σi2)

  16. Anatomically Informed Activation Prior • The prior for the activation indicators (only) can be informed by all sorts of information • Here, we only consider the fact that activation can only occur in areas of grey matter • Let g=(g1,…,gN) denote whether, or not, each voxel is grey matter • We assume p(g)=p(g1)…p(gN), where p(gi) are known • Then, if p(γi =1 | gi) = a (0.1 in our empirical experiment), p(γi =1) = p(gi) a = ci • When θ=0 we can equate this with the marginal prior from the Ising density to obtain values for the external field to get p(γi =1) = exp(αi)/exp(αi+1) = ci

  17. Two-Step Procedure • However, the drawback of using non-zero external field coefficients in the Ising prior for the activation effect is that θ is no longer simply a smoothing parameter • Therefore, we use a two-step procedure: Step (1) Set α=0, estimate and obtain a point estimate θ*=E(θ|y) Step (2) Refit with an anatomically informed Ising prior, but conditional on fixed level of smoothing θ* • This overcomes the complex inter-relationship between θ & the marginal probability of activation with non-zero α

  18. Posterior Analysis using 3D Neighborhood • We fit the data using a 3D neighborhood (9+9+8 neighbors) • For simplicity here we do not undertake variable selection on the parametric trend terms, just the activation variable • Three priors are employed (see table 1) • The activation and amplitude maps obtained using prior (i) are in given in fig 2 • The activation maps using prior (ii) for 8 contiguous slices are given in fig 4- they reveal the important of using anatomical prior information • The activation mpas using prior (iii) for the same 8 slices are given in fig 6- they reveal the importance of spatial smoothing

  19. Three Ising Priors • Prior (i) corresponds to step 1 in the two stage procedure • Prior (ii) corresponds to step 2 in the two stage procedure • Prior (iii) is for comparison with prior (ii) to demonstrate the impact of spatial smoothing

  20. Some comparisons • An obvious alternative is to simply smooth spatially the activation amplitudes (βi, i=1,…,N) and then classify • Fig. 5 shows the amplitude map using this approach • What happens when variable selection is also undertaken on trend terms? • Therefore, using only a 2D neighborhood structure two estimates were obtained: one with BVS on all terms, and the other with BVS on only one term. • The resulting activation maps are similar and are in fig 8.

More Related