630 likes | 805 Views
The statistical analysis of fMRI data. Keith Worsley 12 , Chuanhong Liao 1 , John Aston 123 , Jean-Baptiste Poline 4 , Gary Duncan 5 , Vali Petre 2 , Frank Morales 6 , Alan Evans 2 , Tom Nichols 7 , Satoru Hayasaki 8 , Jonathan Taylor 9
E N D
The statistical analysis of fMRI data Keith Worsley12, Chuanhong Liao1, John Aston123, Jean-Baptiste Poline4, Gary Duncan5, Vali Petre2, Frank Morales6, Alan Evans2, Tom Nichols7, Satoru Hayasaki8, Jonathan Taylor9 1Department of Mathematics and Statistics, McGill University, 2Brain Imaging Centre, Montreal Neurological Institute, 3Statistica Sinica, Taipei, 4Neurospin, CEA, Orsay, 5Centre de Recherche en Sciences Neurologiques, Université de Montréal, 6Cuban Neuroscience Centre 7GlaxoSmithKline and FMRIB, Oxford 8Wake Forest University 9Université de Montréal and Stanford
Temporal components (sd, % variance explained) 1 0.68, 46.9% 2 0.29, 8.6% Component 3 0.17, 2.9% 4 0.15, 2.4% 0 20 40 60 80 100 120 140 Frame Spatial components 1 1 0.5 2 Component 0 3 -0.5 4 -1 0 2 4 6 8 10 12 Slice (0 based) Before you start: PCA of time space 1: exclude first frames 2: drift 3: long-range correlation or anatomical effect: remove by converting to % of brain 4: signal?
Bad design: 2 mins rest 2 mins Mozart 2 mins Eminem 2 mins James Brown
Rest Mozart Eminem J. Brown Temporal components (sd, % variance explained) Period: 5.2 16.1 15.6 11.6 seconds 1 0.41, 17% 2 0.31, 9.5% Component 3 0.24, 5.6% 0 50 100 150 200 Frame Spatial components 1 1 0.5 Component 2 0 -0.5 3 -1 0 2 4 6 8 10 12 14 16 18 Slice (0 based)
Alternating hot and warm stimuli separated by rest (9 seconds each). 2 warm warm hot hot 1 0 -1 0 50 100 150 200 250 300 350 Hemodynamic response function: difference of two gamma densities 0.4 0.2 0 -0.2 0 50 Responses = stimuli * HRF, sampled every 3 seconds 2 1 0 -1 0 50 100 150 200 250 300 350 Time, seconds Effect of stimulus on brain response Modeled by convolving the stimulus with the “hemodynamic response function” Stimulus is delayed and dispersed by ~6s
First scan of fMRI data Highly significant effect, T=6.59 1000 hot 890 rest 880 870 warm 500 0 100 200 300 No significant effect, T=-0.74 820 hot 0 rest 800 T statistic for hot - warm effect warm 5 0 100 200 300 Drift 810 0 800 790 -5 0 100 200 300 Time, seconds fMRI data, pain experiment, one slice T = (hot – warm effect) / S.d. ~ t110 if no effect
How fMRI differs from other repeated measures data • Many reps (~200 time points) • Few subjects (~15) • Df within subjects is high, so not worth pooling sd across subjects • Df between subjects low, so use spatial smoothing to boost df • Data sets are huge ~4GB, not easy to use statistics packages such as R
FMRISTAT (Matlab) / BRAINSTAT (Python)statistical analysis strategy • Analyse each voxel separately • Borrow strength from neighbours when needed • Break up analysis into stages • 1st level: analyse each time series separately • 2nd level: combine 1st level results over runs • 3rd level: combine 2nd level results over subjects • Cut corners: do a reasonable analysis in a reasonable time (or else no one will use it!)
1st level: Linear model with AR(p) errors • Data • Yt = fMRI data at time t • xt = (responses,1, t, t2, t3, … )’ to allow for drift • Model • Yt = xt’β + εt • εt = a1εt-1 + … + apεt-p + σFηt, ηt~ N(0,1) i.i.d. • Fit in 2 stages: • 1st pass: fit by least squares, find residuals, estimate AR parameters a1 … ap • 2nd pass: whiten data, re-fit by least squares
Higher levels:Mixed effects model • Data • Ei = effect (contrast in β) from previous level • Si = sd of effect from previous level • zi = (1, treatment, group, gender, …)’ • Model • Ei = zi’γ + SiεiF + σRεiR (Sihigh df, soassumed fixed) • εiF ~ N(0,1) i.i.d. fixed effects error • εiR ~ N(0,1) i.i.d. random effects error • Fit by ReML • Use EM for stability, 10 iterations
Where we use spatial information • 1st level: smooth AR parameters to lower variability and increase “df” • Higher levels: smooth Random / Fixed effects sd ratio to lower variability and increase “df” • Final level: use random field theory to correct for multiple comparisons
1st level: Autocorrelation AR(1) model: εt = a1εt-1 + σFηt • Fit the linear model using least squares • εt = Yt – Yt • â1 = Correlation (εt , εt-1) • Estimating εtchanges their correlation structure slightly, so â1 is slightly biased: • Raw autocorrelation Smoothed 12.4mm Bias corrected â1 ~ -0.05 ~ 0
How much smoothing? • Variability in • â lowers df • Df depends • on contrast • Smoothing â brings df back up: dfâ = dfresidual(2 + 1) 1 1 2 acor(contrast of data)2 dfeffdfresidualdfâ FWHMâ2 3/2 FWHMdata2 = + Hot-warm stimulus Hot stimulus FWHMdata = 8.79 Residual df = 110 Residual df = 110 100 100 Target = 100 df Target = 100 df Contrast of data, acor = 0.79 50 Contrast of data, acor = 0.61 50 dfeff dfeff 0 0 0 10 20 30 0 10 20 30 FWHM = 10.3mm FWHM = 12.4mm FWHMâ FWHMâ
Higher order AR model? Try AR(3): No correlation biases T up ~12% → more false positives â â â 1 2 3 0.3 0.2 AR(1) seems to be adequate 0.1 0 … has little effect on the T statistics: -0.1 AR(1), df=100 AR(2), df=99 AR(3), df=98 5 0 -5
Run 1 Run 2 Run 3 Run 4 2nd level 1 Effect, 0 E i -1 0.2 Sd, S 0.1 i 0 5 T stat, 0 E / S i i -5 2nd level: 4 runs, 3 df for random effects sd … very noisy sd: … and T>15.96 for P<0.05 (corrected): … so no response is detected …
Solution: Spatial smoothing of the sd ratio • Basic idea: increase df by spatial smoothing (local pooling) of the sd. • Can’t smooth the random effects sd directly, - too much anatomical structure. • Instead, • random effects sd • fixed effects sd • which removes the anatomical structure before smoothing. ) sd= smooth fixed effects sd
^ Average Si Random effects sd, 3 df Fixed effects sd, 440 df Mixed effects sd, ~100 df 0.2 0.15 0.1 0.05 0 divide multiply Smoothed sd ratio Random sd / fixed sd 1.5 random effect, sd ratio ~1.3 1 0.5
400 300 200 100 0 0 20 40 Infinity How much smoothing? FWHMratio2 3/2 FWHMdata2 dfrandom = 3, dffixed = 4 110 = 440, FWHMdata = 8mm: dfratio = dfrandom(2 + 1) 1 1 1 dfeffdfratiodffixed = + fixed effects analysis, dfeff = 440 dfeff FWHM = 19mm Target = 100 df random effects analysis, dfeff = 3 FWHMratio
Run 1 Run 2 Run 3 Run 4 2nd level 1 Effect, 0 E i -1 0.2 Sd, S 0.1 i 0 5 T stat, 0 E / S i i -5 Final result: 19mm smoothing, 100 df … less noisy sd: … and T>4.93 for P<0.05 (corrected): … and now we can detect a response!
Final level: Multiple comparisons correction Bonferroni 4.7 4.6 4.5 True T, 10 df 4.4 Random Field Theory 4.3 T, 20 df Gaussianized threshold Discrete Local Maxima (DLM) 4.2 4.1 Gaussian 4 3.9 3.8 3.7 0 1 2 3 4 5 6 7 8 9 10 FWHM of smoothing kernel (voxels) Low FWHM: use Bonferroni In between: use Discrete Local Maxima (DLM) High FWHM: use Random Field Theory
0.12 Gaussian T, 20 df T, 10 df 0.1 Random Field Theory Bonferroni 0.08 0.06 P-value 0.04 Discrete Local Maxima True DLM can ½ P-value when FWHM ~3 voxels 0.02 0 0 1 2 3 4 5 6 7 8 9 10 FWHM of smoothing kernel (voxels) Low FWHM: use Bonferroni In between: use Discrete Local Maxima (DLM) High FWHM: use Random Field Theory
Example: single run, hot-warm Detected by BON and DLM but not by RFT Detected by DLM, but not by BON or RFT
FWHM – the local smoothness of the noise voxel size (1 – correlation)1/2 FWHM = (2 log 2)1/2 (If the noise is modeled as white noise smoothed with a Gaussian kernel, this would be its FWHM) Volume FWHM3 P-values depend on Resels: Resels = Local maximum T = 4.5 Clusters above t = 3.0, search volume resels = 500 0.1 0.1 0.08 0.08 0.06 0.06 P value of cluster P value of local max 0.04 0.04 0.02 0.02 0 0 0 500 1000 0 0.5 1 1.5 2 Resels of search volume Resels of cluster
Non-isotropic data (spatially varying FWHM) FWHM (mm) of scans (110 df) FWHM (mm) of effects (3 df) 20 20 Cluster Resels=1.90 P=0.007 15 15 • fMRI data is smoother in GM than WM • Has little effect on peak P-values • use ‘average’ FWHM inside search region, but • Has a big effect on cluster P-values • smooth regions → big clusters, • rough regions → small clusters, so • Replace cluster volume by cluster resels = volume / FWHM3 10 10 Cluster Resels=0.57 P=0.387 5 5 0 0
0.6 0.4 0.2 0 -0.2 -0.4 -5 0 5 10 15 20 25 t (seconds) Estimating the delay of the response • Delay or latency to the peak of the HRF is approximated by a linear combination of two optimally chosen basis functions: delay basis1 basis2 HRF shift • HRF(t + shift) ~ basis1(t)w1(shift) + basis2(t)w2(shift) • Convolve bases with the stimulus, then add to the linear model
3 2 1 0 -1 -2 -3 -5 0 5 shift (seconds) • Fit linear model, • estimate w1andw2 • Equate w2/w1to estimates, then solve for shift (Hensen et al., 2002) • To reduce bias when the magnitude is small, use • shift / (1 + 1/T2) • where T = w1/ Sd(w1) is the T statistic for the magnitude • Shrinks shift to 0 where there is little evidence for a response. w2 /w1 w1 w2
Shift of the hot stimulus T stat for magnitude T stat for shift Shift (secs) Sd of shift (secs)
Shift of the hot stimulus T stat for magnitude T stat for shift T>4 T~2 Shift (secs) Sd of shift (secs) ~1 sec +/- 0.5 sec
Run 1 Run 2 Run 3 Run 4 MULTISTAT 4 2 Effect, 0 E i -2 -4 2 Sd, 1 S i 0 5 T stat, 0 E / S i i -5 Combining shifts of the hot stimulus (Contours are T stat for magnitude > 4)
Shift of the hot stimulus Shift (secs) T stat for magnitude > 4.93
Contrasts in the data used for effects 2 Hot, Sd = 0.16 Warm, Sd = 0.16 9 sec blocks, 9 sec gaps 1 0 -1 0 50 100 150 200 250 300 350 Hot-warm, Sd = 0.19 Time (secs) 2 Hot, Sd = 0.28 Warm, Sd = 0.43 90 sec blocks, 90 sec gaps Only using data near block transitions 1 0 Ignoring data in the middle of blocks -1 0 50 100 150 200 250 300 350 Hot-warm, Sd = 0.55 Time (secs)
Optimum block design Sd of hot stimulus Sd of hot-warm 0.5 0.5 20 20 0.4 0.4 15 15 Best design 0.3 0.3 Magnitude 10 10 Best design 0.2 0.2 X 5 5 0.1 0.1 0 0 X 0 0 5 10 15 20 5 10 15 20 Gap (secs) (secs) (secs) 1 1 20 20 0.8 0.8 15 15 0.6 0.6 Best design X Delay Best design X 10 10 0.4 0.4 5 5 0.2 0.2 (Not enough signal) (Not enough signal) 0 0 0 5 10 15 20 5 10 15 20 Block (secs)
Optimum event design 0.5 (Not enough signal) uniform . . . . . . . . . ____ magnitudes ……. delays random .. . ... .. . concentrated : 0.4 Sd of effect (secs for delays) 0.3 0.2 12 secs best for magnitudes 0.1 0 7 secs best for delays 5 10 15 20 Average time between events (secs)
+ + How many subjects? • Largest portion of variance comes from the last stage i.e. combining over subjects: sdrun2sdsess2sdsubj2 nrunnsessnsubj nsessnsubj nsubj • If you want to optimize total scanner time, take more subjects. • What you do at early stages doesn’t matter very much!
Features special to FMRISTAT / BRAINSTAT • Bias correction for AR coefficients • Df boosting due to smoothing: • AR coefficients • random/fixed effects variance • P-value adjustment for: • peaks due to small FWHM (DLM) • clusters due to spatially varying FWHM • Delays analysed the same way as magnitudes • Sd of effects before collecting data
Our entry in the Functional Imaging Analysis contest Jonathan Taylor Stanford Keith Worsley McGill
Why a Functional Imaging Analysis Contest (FIAC)? • Competing packages produce slightly different results, which is “correct”? • Simulated data? • Real data, compare analyses • “Contest” session at 2005 Human Brain Map conference • 9 entrants • Results in a special issue of Human Brain Mapping in May, 2006
The main participants • SPM (Statistical Parametric Mapping, 1993), University College, London, “SAS”, (MATLAB) • AFNI (1995), NIH, more display and manipulation, not much stats (C) • FSL (2000), Oxford, the “upstart” (C) • …. • FMRISTAT (2001), McGill, stats only (MATLAB) • BRAINSTAT (2005), Stanford/McGill, Python version of FMRISTAT
FIAC paradigm • 16 subjects • 4 runs per subject • 2 runs: event design • 2 runs: block design • 4 conditions per run • Same sentence, same speaker • Same sentence, different speaker • Different sentence, same speaker • Different sentence, different speaker • 3T, 191 frames, TR=2.5s
Response • Events • Blocks Beginning of block/run
Design matrix for block expt • B1, B2 are basis functions for magnitude and delay:
1st level analysis • Motion and slice time correction (using FSL) • 5 conditions • Smoothing of temporal autocorrelation to control the effective df
Efficiency • Sd of contrasts (lower is better) for a single run, assuming additivity of responses • For the magnitudes, event and block have similar efficiency • For the delays, event is much better.
2nd level analysis • Analyse events and blocks separately • Register contrasts to Talairach (using FSL) • Bad registration on 2 subjects - dropped • Combine 2 runs using fixed FX • Combine remaining 14 subjects using random FX • 3 contrasts × event/block × magnitude/delay = 12 • Threshold using best of Bonferroni, random field theory, and discrete local maxima (new!) 3rd level analysis
Part of slice z = -2 mm