Massively Univariate Inference for fMRI

Massively Univariate Inference for fMRI Thomas Nichols, Ph.D. Assistant Professor Department of Biostatistics University of Michigan http://www.sph.umich.edu/~nichols ISBI April 15, 2004

Introduction & Overview • Inference in fMRI • Where’s the blob!? • Overview I. Statistics Background II. Assessing Statistic Images III. Thresholding Statistic Images

I. Statistics Background • Hypothesis Testing • Fixed vs Random Effects • Nonparametric/resampling inference I. Statistics Background

u  Null Distribution of T t P-val Null Distribution of T Hypothesis Testing • Null Hypothesis H0 • Test statistic T • t observed realization of T •  level • Acceptable false positive rate • P( T>u | H0) =  • P-value • Assessment of t assuming H0 • P( T > t | H0 ) • Prob. of obtaining stat. as large or larger in a new experiment • P(Data|Null) not P(Null|Data) I. Statistics Background

Random Effects Models • GLM has only one source of randomness • Residual error  • But people are another source of error • Everyone activates somewhat differently… I. Statistics Background

Fixed vs. RandomEffects Sampling distnsof each subject’s effect Subj. 1 • Fixed Effects • Intra-subject variation suggests all these subjects different from zero • Random Effects • Inter-subject variation suggests population not very different from zero Subj. 2 Subj. 3 Subj. 4 Subj. 5 Subj. 6 Populationdistn of effect 0 I. Statistics Background

Random Effects for fMRI • Summary Statistic Approach • Easy • Create contrast images for each subject • Analyze contrast images with one-sample t • Limited • Only allows one scan per subject • Assumes balanced designs and homogeneous meas. error. • Full Mixed Effects Analysis • Hard • Requires iterative fitting • REML to estimate inter- and intra subject variance • SPM2 & FSL implement this, very differently • Very flexible I. Statistics Background

5% Parametric Null Distribution 5% Nonparametric Null Distribution Nonparametric Inference • Parametric methods • Assume distribution ofstatistic under nullhypothesis • Needed to find P-values, u • Nonparametric methods • Use data to find distribution of statisticunder null hypothesis • Any statistic! I. Statistics Background

Nonparametric Inference:Permutation Test • Assumptions • Null Hypothesis Exchangeability • Method • Compute statistic t • Resample data (without replacement), compute t* • {t*} permutation distribution of test statistic • P-value = #{ t* > t } / #{ t* } • Theory • Given data and H0, each t* has equal probability • Still can assume data randomly drawn from population I. Statistics Background

Permutation TestToy Example • Data from V1 voxel in visual stim. experiment A: Active, flashing checkerboard B: Baseline, fixation 6 blocks, ABABAB Just consider block averages... • Null hypothesis Ho • No experimental effect, A & B labels arbitrary • Statistic • Mean difference I. Statistics Background

Permutation TestToy Example • Under Ho • Consider all equivalent relabelings • Compute all possible statistic values • Find 95%ile of permutation distribution I. Statistics Background

Permutation TestToy Example • Under Ho • Consider all equivalent relabelings • Compute all possible statistic values • Find 95%ile of permutation distribution -8 -4 0 4 8 I. Statistics Background

Permutation TestStrengths • Requires only assumption of exchangeability • Under Ho, distribution unperturbed by permutation • Allows us to build permutation distribution • Subjects are exchangeable • Under Ho, each subject’s A/B labels can be flipped • fMRI scans not exchangeable under Ho • Due to temporal autocorrelation I. Statistics Background

Permutation TestLimitations • Computational Intensity • Analysis repeated for each relabeling • Not so bad on modern hardware • No analysis discussed below took more than 3 hours • Implementation Generality • Each experimental design type needs unique code to generate permutations • Not so bad for population inference with t-tests I. Statistics Background

Nonparametric Inference:The Bootstrap • Theoretical differences • Independence instead of exchangeability • Asymtotically valid • For each design, finite sample size properties should be evaluated • Practical differences • Resample residuals with replacement • Residuals, not data, resampled • (1) Estimate model parameterb or statistic t • (2) Create residuals • (3) Resample residuals e* • (4) Add model fit to e*, constitute Bootstrap dataset y* • (5) Estimate b* or t* using y* • (6) Repeat (3)-(5) to build Bootstrap distribution of b* or t* • Many important details • See books: Efron & Tibshirani (1993), Davison & Hinkley (1997) I. Statistics Background

t > 0.5 t > 3.5 t > 5.5 I. Assessing Statistic Images Where’s the signal? High Threshold Med. Threshold Low Threshold Good SpecificityPoor Power(risk of false negatives) Poor Specificity(risk of false positives)Good Power • ...but why threshold?! II. Assessing Statistic Images

Blue-sky inference:What we’d like • Don’t threshold, model! • Signal location? • Estimates and CI’s on (x,y,z) location • Signal intensity? • CI’s on % change • Spatial extent? • Estimates and CI’s on activation volume • Robust to choice of cluster definition • ...but this requires an explicit spatial model II. Assessing Statistic Images

Blue-sky inference...a spatial model? • No routine spatial modeling methods exist • High-dimensional mixture modeling problem • Activations don’t look like Gaussian blobs • Need realistic shapes, sparse representation • One initial attempt: • Hartvig, N. and Jensen, J. (2000). HBM, 11(4):233-248. II. Assessing Statistic Images

Blue-sky inference:What we get • Signal location • Local maximum (no inference) • Center-of-mass (no inference) • Sensitive to blob-defining-threshold • Signal intensity • Local maximum intensity (P-value, CI’s) • Spatial extent • Volume above blob-defining-threshold(P-value, no CI’s) • Sensitive to blob-defining-threshold II. Assessing Statistic Images

Voxel-wise Tests • Retain voxels above -level threshold u • Gives best spatial specificity • The null hyp. at a single voxel can be rejected u space Significant Voxels No significant Voxels II. Assessing Statistic Images

Cluster-wise tests • Two step-process • Define clusters by arbitrary threshold uclus • Retain clusters larger than -level threshold k uclus space Cluster not significant Cluster significant k k II. Assessing Statistic Images

Inference on ImagesCluster-wise tests • Typically better sensitivity • Worse spatial specificity • The null hyp. of entire cluster is rejected • Only means that one or more of voxels in cluster active uclus space Cluster not significant Cluster significant k k II. Assessing Statistic Images

Voxel-Cluster Inference • Joint Inference • Combine both voxel height T & cluster size C • Poline et al used bivariate probability • JCBFM (1994) 14:639-642. • P( maxxCluster T(x) > t , C > c ) • Only for Gaussian images, Gaussian ACF • Bullmore et al use “cluster mass” • IEEE-TMI (1999) 18(1):32-42. • Test statistic = xCluster (T(x)-uclus) dx • No RFT result, uses permutation test II. Assessing Statistic Images

Voxel-Cluster Inference • Not a new “level” of inference • cf, Friston et al (1996), Detecting Activations in PET and fMRI: Levels of Inference and Power. NI 4:223-325 • Joint null is intersection of nulls • Joint null = {Voxel null}  {Cluster null} • If joint null rejected, only know one or other is false • Can’t point to any voxel as active • Joint voxel-cluster inference really just “Intensity-informed cluster-wise inference” II. Assessing Statistic Images

t > 2.5 t > 4.5 t > 0.5 t > 1.5 t > 3.5 t > 5.5 t > 6.5 Multiple Comparisons Problem • Which of 100,000 voxels are sig.? • =0.05  5,000 false positive voxels • Which of (random number, say) 100 clusters significant? • =0.05  5 false positives clusters II. Assessing Statistic Images

MCP Solutions:Measuring False Positives • Familywise Error Rate (FWER) • Familywise Error • Existence of one or more false positives • FWER is probability of familywise error • False Discovery Rate (FDR) • FDR = E(V/R) • R voxels declared active, V falsely so • Realized false discovery rate: V/R II. Assessing Statistic Images

Signal Measuring False PositivesFWER vs FDR Noise Signal+Noise II. Assessing Statistic Images

11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5% 6.7% 10.5% 12.2% 8.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% Control of Per Comparison Rate at 10% Percentage of Null Pixels that are False Positives Control of Familywise Error Rate at 10% FWE Occurrence of Familywise Error Control of False Discovery Rate at 10% Percentage of Activated Pixels that are False Positives II. Assessing Statistic Images

MCP Solutions:Measuring False Positives • Familywise Error Rate (FWER) • Familywise Error • Existence of one or more false positives • FWER is probability of familywise error • False Discovery Rate (FDR) • FDR = E(V/R) • R voxels declared active, V falsely so • Realized false discovery rate: V/R II. Assessing Statistic Images

III. Thresholding Statistic ImagesFWER MCP Solutions • Bonferroni • Maximum Distribution Methods • Random Field Theory • Permutation III. Thresholding Statistic Images

FWE MCP Solutions: Bonferroni • For a statistic image T... • Tiith voxel of statistic image T • ...use  = 0/V • 0 FWER level (e.g. 0.05) • V number of voxels • u -level statistic threshold, P(Ti u) =  • By Bonferroni inequality... FWER = P(FWE) = P( i {Tiu} | H0)i P( Tiu| H0 ) = i = i0 /V = 0 III. Thresholding Statistic Images

FWER MCP Solutions • Bonferroni • Maximum Distribution Methods • Random Field Theory • Permutation III. Thresholding Statistic Images

 FWER MCP Solutions: Controlling FWER w/ Max • FWER & distribution of maximum FWER = P(FWE) = P( i {Tiu} | Ho) = P( maxiTi u | Ho) • 100(1-)%ile of max distn controls FWER FWER = P( maxiTi u | Ho) =  • where u = F-1max (1-) . III. Thresholding Statistic Images u

FWER MCP Solutions:Random Field Theory • Euler Characteristic u • Topological Measure • #blobs - #holes • At high thresholds,just counts blobs • FWER = P(Max voxel u | Ho) = P(One or more blobs | Ho) P(u  1 | Ho) E(u| Ho) Threshold Random Field No holes Never more than 1 blob III. Thresholding Statistic Images Suprathreshold Sets

Only very upper tail approximates1-Fmax(u) RFT Details:Expected Euler Characteristic E(u) () ||1/2 (u 2 -1) exp(-u 2/2) / (2)2 • Search regionR3 • (volume • ||1/2roughness • Assumptions • Multivariate Normal • Stationary* • ACF twice differentiable at 0 • Stationary • Results valid w/out stationary • More accurate when stat. holds III. Thresholding Statistic Images

FWHM Autocorrelation Function Random Field TheorySmoothness Parameterization • E(u) depends on ||1/2 •  roughness matrix: • Smoothness parameterized as Full Width at Half Maximum • FWHM of Gaussian kernel needed to smooth a whitenoise random field to roughness  III. Thresholding Statistic Images

Random Field TheorySmoothness Parameterization • RESELS • Resolution Elements • 1 RESEL = FWHMx FWHMy FWHMz • RESEL Count R • R = () || • Volume of search region in units of smoothness • Eg: 10 voxels, 2.5 FWHM, 4 RESELS • Wrong RESEL interpretation • “Number of independent ‘things’ in the image” • See Nichols & Hayasaka, 2003, Stat. Meth. in Med. Res. . III. Thresholding Statistic Images

Random Field Intuition • Corrected P-value for voxel value t Pc = P(max T > t) E(t) () ||1/2t2 exp(-t2/2) • Statistic value t increases • Pc decreases (but only for large t) • Search volume increases • Pc increases (more severe MCP) • Roughness increases (Smoothness decreases) • Pc increases (more severe MCP) III. Thresholding Statistic Images

RFT Details:Super General Formula • General form for expected Euler characteristic • 2, F, & t fields • restricted search regions •D dimensions • E[u(W)] = SdRd(W)rd (u) Rd (W):d-dimensional Minkowski functional of W – function of dimension, spaceWand smoothness: R0(W) = (W) Euler characteristic of W R1(W) = resel diameter R2(W) = resel surface area R3(W) = resel volume rd (W):d-dimensional EC density of Z(x) – function of dimension and threshold, specific for RF type: E.g. Gaussian RF: r0(u) = 1- (u) r1(u) = (4 ln2)1/2 exp(-u2/2) / (2p) r2(u) = (4 ln2) exp(-u2/2) / (2p)3/2 r3(u) = (4 ln2)3/2 (u2 -1) exp(-u2/2) / (2p)2 r4(u) = (4 ln2)2 (u3 -3u) exp(-u2/2) / (2p)5/2  III. Thresholding Statistic Images

Expected Cluster Size E(S) = E(N)/E(L) S cluster size N suprathreshold volume({T > uclus}) L number of clusters E(N) = () P( T > uclus ) E(L)  E(u) Assuming no holes 5mm FWHM 10mm FWHM 15mm FWHM Random Field TheoryCluster Size Tests III. Thresholding Statistic Images

Random Field TheoryCluster Size Distribution • Gaussian Random Fields (Nosko, 1969) • D: Dimension of RF • t Random Fields (Cao, 1999) • B: Beta distn • U’s: 2’s • c chosen s.t.E(S) = E(N) / E(L) III. Thresholding Statistic Images

Random Field TheoryCluster Size Corrected P-Values • Previous results give uncorrected P-value • Corrected P-value • Bonferroni • Correct for expected number of clusters • Corrected Pc = E(L) Puncorr • Poisson Clumping Heuristic (Adler, 1980) • Corrected Pc = 1 - exp( -E(L) Puncorr ) III. Thresholding Statistic Images

Lattice ImageData  Continuous Random Field Random Field Theory Limitations • Sufficient smoothness • FWHM smoothness 3-4 times voxel size • More like ~10 times for low-df data • Smoothness estimation • Estimate is biased when images not sufficiently smooth • Multivariate normality • Virtually impossible to check • Several layers of approximations III. Thresholding Statistic Images

Active ... ... yes Baseline ... ... D UBKDA N XXXXX no Real Data • fMRI Study of Working Memory • 12 subjects, block design Marshuetz et al (2000) • Item Recognition • Active:View five letters, 2s pause, view probe letter, respond • Baseline: View XXXXX, 2s pause, view Y or N, respond • Second Level RFX • Difference image, A-B constructedfor each subject • One sample t test III. Thresholding Statistic Images

Real Data:RFT Result • Threshold • S = 110,776 • 2  2  2 voxels5.1  5.8  6.9 mmFWHM • u = 9.870 • Result • 5 voxels above the threshold • 0.0063 minimumFWE-correctedp-value -log10 p-value III. Thresholding Statistic Images

FWER-controlling MCP Solutions • Bonferroni • Maximum Distribution Methods • Random Field Theory • Permutation III. Thresholding Statistic Images

5% Parametric Null Max Distribution 5% Nonparametric Null Max Distribution Controlling FWER: Permutation Test • Parametric methods • Assume distribution ofmax statistic under nullhypothesis • Nonparametric methods • Use data to find distribution of max statisticunder null hypothesis • Any max statistic! III. Thresholding Statistic Images

Permutation TestOther Statistics • Nonparametric allows arbitrary statistics • Standard ones are usually best, but... • Consider smoothed variance t statistic • To regularize low-df variance estimate III. Thresholding Statistic Images skip

mean difference Permutation TestSmoothed Variance t • Nonparametric allows arbitrary statistics • Standard ones are usually best, but... • Consider smoothed variance t statistic t-statistic variance III. Thresholding Statistic Images

Permutation TestSmoothed Variance t • Nonparametric allows arbitrary statistics • Standard ones are usually best, but... • Consider smoothed variance t statistic SmoothedVariancet-statistic mean difference smoothedvariance III. Thresholding Statistic Images

Massively Univariate Inference for fMRI