Statistical Inference, Multiple Comparisons and Random Field Theory

Statistical Inference, Multiple Comparisonsand Random Field Theory Andrew Holmes SPM short course, May 2002

Overview… …a voxel by voxel hypothesis testing approach • reliably identify regions showing a significant experimental effect of interest • Assessment of statistic images • multiple comparisons • random field theory • smoothness • spatial levels of inference & power • false discovery rate later... • Generalisability, random effects & population inference • inferring to the population • group comparisons • Non-parametric inference  later...

image data parameter estimates designmatrix kernel • General Linear Model • model fitting • statistic image realignment &motioncorrection random field theory smoothing normalisation StatisticalParametric Map anatomicalreference corrected p-values

condition 1 condition 2 Statistical Parametric Mapping… – parameter estimate variance estimate statistic image orSPM = voxel by voxelmodelling

Null hypothesis H test statistic null distributions Hypothesis test control Type I error incorrectly reject H test level Pr(“reject” H | H)  test size Pr(“reject H | H) p –value min a at which Hrejected Pr(T t | H) characterising “surprise” t –distribution, 32 df. F –distribution, 10,32 df. Classical hypothesis testing…

Multiple comparisons… t59 • Threshold at p ? • expect (100  p)% by chance • Surprise ? • extreme voxel values • voxel level inference • big suprathreshold clusters • cluster level inference • many suprathreshold clusters • set level inference • Power & localisation • sensitivity • spatial specificity p = 0.05 Gaussian10mm FWHM (2mm pixels)

Multiple comparisons terminology… • Family of hypotheses • Hk k = {1,…,K} • H = Hk • Familywise Type I error • weak control – omnibus test • Pr(“reject” HH)  • “anything, anywhere”? • strong control – localising test • Pr(“reject” HW HW)   W: W   & HW • “anything, & where”? • Adjusted p–values • test level at which reject Hk

Threshold u  tk > u reject Hk reject any Hk reject H reject H if tmax > u Valid test weak control Pr(Tmax > uH)  strong control since W  Pr(TWmax > uHW)   Adjusted p –values Pr(Tmax > tkH) p = 0.05 p = 0.0001 p = 0.0000001 Simple threshold tests… u

“The” Bonferroni inequality Carlo Emilio Bonferroni (1936) For any set of events Ak : Bonferroni correction Ak : correctly “accept” Hk Tk < u & Hk Assess Hk at level ' correction ' =  / K Adjusted p –values min(1,Kpk ) Conservative for correlated tests independent: K tests some dependence : ? tests totally dependent: 1 test ua = -1(1-/K) The “Bonferroni” correction… 5mm 10mm 15mm

Consider statistic image as lattice representation of a continuous random field Use results from continuous random field theory SPM approach: Random fields…  lattice represtntation

Topological measure of excursion set Au Au = {x R3 : Z(x) > u} # components - # “holes” Single threshold test large u, near Tmax Euler char.  #local max Expected Euler char p–value Pr(Zmax > u )  Pr((Au)> 0 ) E[(Au)] single threshold test u s.t. E[(Au)] =  Euler characteristic…

E[(Au)] () ||(u 2 -1) exp(-u 2/2) / (2)2  largesearch region R3 ( volume || smoothness Au  excursion set Au = {x R3 : Z(x) > u} Z(x) Gaussian random field x R3+ Multivariate Normal Finite Dimensional distributions + continuous + strictly stationary + marginal N(0,1) + continuously differentiable + twice differentiable at 0 + Gaussian ACF(at least near local maxima) Au  Expected Euler characteristic…

Smoothness || variance-covariance matrix of partial derivatives (possibly location dependent) Point Response Function PRF Full Width at Half Maximum FWHM Gaussian PRF  – kernel var/cov matrix ACF 2  = (2)-1 FWHM f = (8ln(2)) fx 0 0  = 0 fy 0 1 0 0 fz 8ln(2) ignoring covariances || = (4ln(2))3/2 / (fx fy fz) Resolution Element (RESEL) Resel dimensions (fx fy fz) R3() = () / (fx fy fz) if strictly stationary E[(Au)] = R3() (4ln(2))3/2 (u 2 -1) exp(-u 2/2) / (2)2  R3() (1 – (u))for high thresholds u Smoothness, PRF, resels... 

Y = X  +  ^  Component fields… voxels ? ? =  + parameters design matrix errors data matrix scans s2 variance parameterestimates • estimate   residuals  estimated variance = estimatedcomponentfields “Image regression”

Smoothness estimation… • Smoothness • from standardised residuals • empirical derivatives at each voxel • Resels per voxel (RPV) – an “image” of smoothness • correction for estimation of variance field 2 • function of degrees of freedom • covariances often ignored • Euler Characteristics • using discrete methods

Au  Unified p-values… • General form for expected Euler characteristic • 2, F, & t fields • restricted search regions •D dimensions • E[(WAu)] = S Rd (W)rd (u) Rd (W):d-dimensional Minkowski functional of W – function of dimension, spaceWand smoothness: R0(W) = (W) Euler characteristic of W R1(W) = resel diameter R2(W) = resel surface area R3(W) = resel volume rd (W):d-dimensional EC density of Z(x) – function of dimension and threshold, specific for RF type: E.g. Gaussian RF: (strictly stationary &c…) r0(u) = 1- (u) r1(u) = (4 ln2)1/2 exp(-u2/2) / (2p) r2(u) = (4 ln2) exp(-u2/2) / (2p)3/2 r3(u) = (4 ln2)3/2 (u2 -1) exp(-u2/2) / (2p)2 r4(u) = (4 ln2)2 (u3 -3u) exp(-u2/2) / (2p)5/2

Primary threshold u examine connected components of excursion set Suprathreshold clusters Reject HW for clusters of voxels W of size S > s Localisation (Strong control) at cluster level increased power esp. high resolutions (f MRI) Thresholds, p –values Pr(Smax > s H )  Nosko, Friston, (Worsley) Poisson occurrence (Adler) Assumme form for Pr(S=s|S>0) Suprathreshold cluster tests… 5mm FWHM 10mm FWHM 15mm FWHM (2mm2 pixels)

n=12 n=82 n=32 Levels of inference… voxel-level P(c  1 | n  0, t  4.37) = 0.048 (corrected) P(t 4.37) = 1 - {4.37} < 0.001 (uncorrected) omnibus P(c7 | n  0, u  3.09) = 0.031 set-level P(c  3 | n  12, u  3.09) = 0.019 Parameters u - 3.09 k - 12 voxels S - 323 voxels FWHM - 4.7 voxels D - 3 cluster-level P(c  1 | n  82, t  3.09) = 0.029 (corrected) P(n  82 | t  3.09) = 0.019 (uncorrected)

Summary: Levels of inference & power

SPM results...

Model fit & assumptions valid distributional results Multivariate normality of component images Strict stationarity (pre SPM99) of component images homogeneous spatial structure Smoothness smoothness » voxel size lattice approximation smoothness estimation practically FWHM 3 VoxDim otherwise conservative (voxel level) lax (spatial extent) spatial smoothing? temporal smoothing? Assumptions…

Random effects & variance components • Fixed effects • Are you confident that a new observation from any of subjects 1-3 will be greater than zero? • Yes!using within-subjects variance • infer for these subjects – case study • Random effects • Are you confident that a new observation from a new subject will be greater than zero? • No!using between-subjects variance • infer for any subject – population

^ 1 ^  ^ 2 ^  ^ 3 ^  ^ 4 ^  ^ 5 ^  ^ 6 ^  Multi-subject analysis…? estimated mean activation image p < 0.001 (uncorrected) SPM{t} — ^ •– c.f. 2 / nw – c.f. p < 0.05 (corrected) SPM{t}

^ 1 ^  ^ 2 ^  ^ 3 ^  ^ 4 ^  ^ 5 ^  ^ 6 ^  Two-stage analysis of random effect… level-one(within-subject) level-two(between-subject)  an estimate of the mixed-effects model variance 2+2/w  ^ variance 2 (no voxels significant at p < 0.05 (corrected))  — ^ •– c.f. 2/n = 2 /n + 2 / nw  – c.f.  p < 0.001 (uncorrected)  SPM{t} contrast images timecourses at [ 03, -78, 00 ]

Two stage random effects group comparison vs. two-sample t-test 12 subjects level-one(within-subject) contrast images level-two(between-subject)

Multi-stage multi-level modelling… estimated contrasts from level-1 fits, level-2 model & level-2 contrasts level-1 data, model & contrast(s) parameter estimation inference level 2 estimated contrasts and residual variance level 2(population)inference

Hypothesis testing !? • Why test? • reliability  genuine effects  integrity of research (hopefully) • The fallacy… • point null hypothesis(no change) • things are never the same! (always some small chance change) • given enough observations can always reject null hypothesis ! • fMRI !?(lots of observations) …testing, rather than estimating • significant  important !? …and: “absence of evidence isnotevidence of absence” !?

Ch5 Ch4 Multiple Comparisons,& Random Field Theory Worsley KJ, Marrett S, Neelin P, Evans AC (1992) “A three-dimensional statistical analysis for CBF activation studies in human brain”Journal of Cerebral Blood Flow and Metabolism12:900-918 Worsley KJ, Marrett S, Neelin P, Vandal AC, Friston KJ, Evans AC (1995) “A unified statistical approach for determining significant signals in images of cerebral activation”Human Brain Mapping 4:58-73 Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC, Evans AC (1994)“Assessing the Significance of Focal Activations Using their Spatial Extent”Human Brain Mapping 1:214-220 Cao J (1999)“The size of the connected components of excursion sets of 2, t and F fields”Advances in Applied Probability (in press) Worsley KJ, Marrett S, Neelin P, Evans AC (1995)“Searching scale space for activation in PET images”Human Brain Mapping 4:74-90 Worsley KJ, Poline J-B, Vandal AC, Friston KJ (1995)“Tests for distributed, non-focal brain activations”NeuroImage 2:183-194 Friston KJ, Holmes AP, Poline J-B, Price CJ, Frith CD (1996)“Detecting Activations in PET and fMRI: Levels of Inference and Power”Neuroimage 4:223-235

index • overview • multiple comparisons • random field theory • random effects • hypothesis testing fallacy

Statistical Inference, Multiple Comparisons and Random Field Theory