Inference on SPMs: Random Field Theory & Alternatives

Inference on SPMs:Random Field Theory & Alternatives Thomas Nichols, Ph.D. Director, Modelling & GeneticsGlaxoSmithKline Clinical Imaging Centre http://www.fmrib.ox.ac.uk/~nichols Zurich SPM Course Feb 12, 2009

image data parameter estimates designmatrix kernel Thresholding &Random Field Theory • General Linear Model • model fitting • statistic image realignment &motioncorrection smoothing normalisation StatisticalParametric Map anatomicalreference Corrected thresholds & p-values

Outline • Orientation • Assessing Statistic images • The Multiple Testing Problem • Random Field Theory & FWE • Permutation & FWE • False Discovery Rate

t > 0.5 t > 3.5 t > 5.5 Assessing Statistic Images Where’s the signal? High Threshold Med. Threshold Low Threshold Good SpecificityPoor Power(risk of false negatives) Poor Specificity(risk of false positives)Good Power • ...but why threshold?!

Blue-sky inference:What we’d like • Don’t threshold, model the signal! • Signal location? • Estimates and CI’s on(x,y,z) location • Signal magnitude? • CI’s on % change • Spatial extent? • Estimates and CI’s on activation volume • Robust to choice of cluster definition • ...but this requires an explicit spatial model space

Blue-sky inference:What we need • Need an explicit spatial model • No routine spatial modeling methods exist • High-dimensional mixture modeling problem • Activations don’t look like Gaussian blobs • Need realistic shapes, sparse • Some initial work • Hartvig et al., Penny et al., Xu et al. • Not part of mass-univariate framework

Real-life inference:What we get • Signal location • Local maximum – no inference • Center-of-mass – no inference • Sensitive to blob-defining-threshold • Signal magnitude • Local maximum intensity – P-values (& CI’s) • Spatial extent • Cluster volume – P-value, no CI’s • Sensitive to blob-defining-threshold

Voxel-level Inference • Retain voxels above -level threshold u • Gives best spatial specificity • The null hyp. at a single voxel can be rejected u space Significant Voxels No significant Voxels

Cluster-level Inference • Two step-process • Define clusters by arbitrary threshold uclus • Retain clusters larger than -level threshold k uclus space Cluster not significant Cluster significant k k

Cluster-level Inference • Typically better sensitivity • Worse spatial specificity • The null hyp. of entire cluster is rejected • Only means that one or more of voxels in cluster active uclus space Cluster not significant Cluster significant k k

Set-level Inference • Count number of blobs c • Minimum blob size k • Worst spatial specificity • Only can reject global null hypothesis uclus space k k Here c = 1; only 1 cluster larger than k

u  Null Distribution of T t P-val Null Distribution of T Hypothesis Testing • Null Hypothesis H0 • Test statistic T • t observed realization of T •  level • Acceptable false positive rate • Level  = P( T>u | H0) • Threshold u controls false positive rate at level  • P-value • Assessment of t assuming H0 • P( T > t | H0 ) • Prob. of obtaining stat. as largeor larger in a new experiment • P(Data|Null) not P(Null|Data)

t > 2.5 t > 4.5 t > 0.5 t > 1.5 t > 3.5 t > 5.5 t > 6.5 Multiple Testing Problem • Which of 100,000 voxels are sig.? • =0.05  5,000 false positive voxels • Which of (random number, say) 100 clusters significant? • =0.05  5 false positives clusters

MTP Solutions:Measuring False Positives • Familywise Error Rate (FWE) • Familywise Error • Existence of one or more false positives • FWE is probability of familywise error • False Discovery Rate (FDR) • FDR = E(V/R) • R voxels declared active, V falsely so • Realized false discovery rate: V/R

FWE MTP Solutions: Bonferroni • For a statistic image T... • Tiith voxel of statistic image T ...use  = 0/V • 0 FWE level (e.g. 0.05) • V number of voxels • u -level statistic threshold, P(Ti u) =  Conservative under correlation Independent: V tests Some dep.: ? tests Total dep.: 1 test

 FWE MTP Solutions: Controlling FWE w/ Max • FWE & distribution of maximum FWE = P(FWE) = P( i {Tiu} | Ho) = P( maxiTi u | Ho) • 100(1-)%ile of max distn controls FWE FWE = P( maxiTi u | Ho) =  • where u = F-1max (1-) . u

FWE MTP Solutions:Random Field Theory • Euler Characteristic u • Topological Measure • #blobs - #holes • At high thresholds,just counts blobs • FWE = P(Max voxel u | Ho) = P(One or more blobs | Ho) P(u  1 | Ho) E(u| Ho) Threshold Random Field No holes Never more than 1 blob Suprathreshold Sets

FWHM Autocorrelation Function Random Field TheorySmoothness Parameterization • E(u) depends on ||1/2 •  roughness matrix: • Smoothness parameterized as Full Width at Half Maximum • FWHM of Gaussian kernel needed to smooth a whitenoise random field to roughness 

1 2 3 4 5 6 7 8 9 10 1 2 3 4 Random Field TheorySmoothness Parameterization • RESELS • Resolution Elements • 1 RESEL = FWHMx FWHMy FWHMz • RESEL Count R • R = () || = (4log2)3/2 () / ( FWHMx FWHMy FWHMz ) • Volume of search region in units of smoothness • Eg: 10 voxels, 2.5 FWHM 4 RESELS • Beware RESEL misinterpretation • RESEL are not “number of independent ‘things’ in the image” • See Nichols & Hayasaka, 2003, Stat. Meth. in Med. Res. .

Random Field Intuition • Corrected P-value for voxel value t Pc = P(max T > t) E(t) () ||1/2t2 exp(-t2/2) • Statistic value t increases • Pc decreases (but only for large t) • Search volume increases • Pc increases (more severe MTP) • Smoothness increases (roughness ||1/2 decreases) • Pc decreases (less severe MTP)

Expected Cluster Size E(S) = E(N)/E(L) S cluster size N suprathreshold volume({T > uclus}) L number of clusters E(N) = () P( T > uclus ) E(L)  E(u) Assuming no holes 5mm FWHM 10mm FWHM 15mm FWHM Random Field TheoryCluster Size Tests

RFT Cluster Inference & Stationarity VBM:Image of FWHM Noise Smoothness • Problem w/ VBM • Standard RFT result assumes stationarity, constant smoothness • Assuming stationarity, false positive clusters will be found in extra-smooth regions • VBM noise very non-stationary • Nonstationary cluster inference • Must un-warp nonstationarity • Reported but not implemented • Hayasaka et al, NeuroImage 22:676– 687, 2004 • Now available in SPM toolboxes • http://fmri.wfubmc.edu/cms/software#NS • Gaser’s VBM toolbox (w/ recent update!) Nonstationarynoise… …warped to stationarity

Lattice ImageData  Continuous Random Field Random Field Theory Limitations • Sufficient smoothness • FWHM smoothness 3-4× voxel size (Z) • More like ~10× for low-df T images • Smoothness estimation • Estimate is biased when images not sufficiently smooth • Multivariate normality • Virtually impossible to check • Several layers of approximations • Stationary required for cluster size results

Active ... ... yes Baseline ... ... D UBKDA N XXXXX no Real Data • fMRI Study of Working Memory • 12 subjects, block design Marshuetz et al (2000) • Item Recognition • Active:View five letters, 2s pause, view probe letter, respond • Baseline: View XXXXX, 2s pause, view Y or N, respond • Second Level RFX • Difference image, A-B constructedfor each subject • One sample t test

Real Data:RFT Result • Threshold • S = 110,776 • 2  2  2 voxels5.1  5.8  6.9 mmFWHM • u = 9.870 • Result • 5 voxels above the threshold • 0.0063 minimumFWE-correctedp-value -log10 p-value

5% Parametric Null Distribution 5% Nonparametric Null Distribution Nonparametric Inference • Parametric methods • Assume distribution ofstatistic under nullhypothesis • Needed to find P-values, u • Nonparametric methods • Use data to find distribution of statisticunder null hypothesis • Any statistic!

5% Parametric Null Max Distribution 5% Nonparametric Null Max Distribution Controlling FWE: Permutation Test • Parametric methods • Assume distribution ofmax statistic under nullhypothesis • Nonparametric methods • Use data to find distribution of max statisticunder null hypothesis • Again, any max statistic!

Permutation TestOther Statistics • Collect max distribution • To find threshold that controls FWE • Consider smoothed variance t statistic • To regularize low-df variance estimate

mean difference Permutation TestSmoothed Variance t • Collect max distribution • To find threshold that controls FWE • Consider smoothed variance t statistic t-statistic variance

Permutation TestSmoothed Variance t • Collect max distribution • To find threshold that controls FWE • Consider smoothed variance t statistic SmoothedVariancet-statistic mean difference smoothedvariance

Permutation TestStrengths • Requires only assumption of exchangeability • Under Ho, distribution unperturbed by permutation • Allows us to build permutation distribution • Subjects are exchangeable • Under Ho, each subject’s A/B labels can be flipped • fMRI scans not exchangeable under Ho • Due to temporal autocorrelation

Permutation TestLimitations • Computational Intensity • Analysis repeated for each relabeling • Not so bad on modern hardware • No analysis discussed below took more than 3 hours • Implementation Generality • Each experimental design type needs unique code to generate permutations • Not so bad for population inference with t-tests

Active ... ... yes Baseline ... ... D UBKDA N XXXXX no Permutation TestExample • fMRI Study of Working Memory • 12 subjects, block design Marshuetz et al (2000) • Item Recognition • Active:View five letters, 2s pause, view probe letter, respond • Baseline: View XXXXX, 2s pause, view Y or N, respond • Second Level RFX • Difference image, A-B constructedfor each subject • One sample, smoothed variance t test

Maximum Intensity Projection Thresholded t Permutation DistributionMaximum t Permutation TestExample • Permute! • 212 = 4,096 ways to flip 12 A/B labels • For each, note maximum of t image .

Permutation TestExample • Compare with Bonferroni •  = 0.05/110,776 • Compare with parametric RFT • 110,776 222mm voxels • 5.15.86.9mm FWHM smoothness • 462.9 RESELs

378 sig. vox. Smoothed Variance t Statistic,Nonparametric Threshold Test Level vs. t11 Threshold uRF = 9.87uBonf = 9.805 sig. vox. uPerm = 7.67 58 sig. vox. t11Statistic, Nonparametric Threshold t11Statistic, RF & Bonf. Threshold

Reliability with Small Groups • Consider n=50 group study • Event-related Odd-Ball paradigm, Kiehl, et al. • Analyze all 50 • Analyze with SPM and SnPM, find FWE thresh. • Randomly partition into 5 groups 10 • Analyze each with SPM & SnPM, find FWE thresh • Compare reliability of small groups with full • With and without variance smoothing . Skip

SPM t11: 5 groups of 10 vs all 505% FWE Threshold T>10.93 T>11.04 T>11.01 10 subj 10 subj 10 subj 2 8 11 15 18 35 41 43 44 50 1 3 20 23 24 27 28 32 34 40 9 13 14 16 19 21 25 29 30 45 T>10.69 T>10.10 T>4.66 10 subj 10 subj all 50 4 5 10 22 31 33 36 39 42 47 6 7 12 17 26 37 38 46 48 49

SnPM t: 5 groups of 10 vs. all 505% FWE Threshold T>9.00 Arbitrary thresh of 9.0 T>7.06 T>8.28 T>6.3 10 subj 10 subj 10 subj 2 8 11 15 18 35 41 43 44 50 1 3 20 23 24 27 28 32 34 40 9 13 14 16 19 21 25 29 30 45 T>6.49 T>6.19 T>4.09 10 subj 10 subj all 50 4 5 10 22 31 33 36 39 42 47 6 7 12 17 26 37 38 46 48 49

SnPM SmVar t: 5 groups of 10 vs. all 505% FWE Threshold T>9.00 all 50 Arbitrary thresh of 9.0 T>4.69 T>5.04 T>4.57 10 subj 10 subj 10 subj 2 8 11 15 18 35 41 43 44 50 1 3 20 23 24 27 28 32 34 40 9 13 14 16 19 21 25 29 30 45 T>4.84 T>4.64 10 subj 10 subj 4 5 10 22 31 33 36 39 42 47 6 7 12 17 26 37 38 46 48 49

MTP Solutions:Measuring False Positives • Familywise Error Rate (FWE) • Familywise Error • Existence of one or more false positives • FWE is probability of familywise error • False Discovery Rate (FDR) • FDR = E(V/R) • R voxels declared active, V falsely so • Realized false discovery rate: V/R

False Discovery RateIllustration: Noise Signal Signal+Noise

11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5% 6.7% 10.5% 12.2% 8.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% Control of Per Comparison Rate at 10% Percentage of Null Pixels that are False Positives Control of Familywise Error Rate at 10% FWE Occurrence of Familywise Error Control of False Discovery Rate at 10% Percentage of Activated Pixels that are False Positives

p(i) i/V q Benjamini & HochbergProcedure • Select desired limit q on FDR • Order p-values, p(1)p(2) ...  p(V) • Let r be largest i such that • Reject all hypotheses corresponding top(1), ... , p(r). JRSS-B (1995)57:289-300 1 p(i) p-value i/V q 0 0 1 i/V

Inference on SPMs: Random Field Theory & Alternatives