Sensor & Source Space Statistics Rik Henson (MRC CBU, Cambridge)

Sensor & Source Space Statistics Rik Henson (MRC CBU, Cambridge) With thanks to Jason Taylor, Vladimir Litvak, Guillaume Flandin, James Kilner & Karl Friston

Overview • A mass-univariate statistical approach to localising effects in space/time/frequency (using replications across trials/subjects)…

Overview • Sensor Space: • Random Field Theory (RFT) • 2D Time-Freq (within-subject) • 3D Scalp-Time (within-subject) • 3D Scalp-Time (between-subjects) • Source Space: • 3D contrast images • SPM vs SnPM vs PPM (vs FDR) • Other issues & Future directions • Multivariate

1. Random Field Theory (RFT) RFT is a method for correcting for multiple statistical comparisons with N-dimensional spaces (for parametric statistics, eg Z-, T-, F- statistics)… • When is there an effect in time, eg GFP (1D)? • Where is there an effect in time-frequency space (2D)? • Where is there an effect in time-sensor space (3D)? • Where is there an effect in time-source space (4D)? Worsley Et Al (1996). Human Brain Mapping, 4:58-73

2. Single-subject Example • “Multimodal” Dataset in SPM8 manual (and website) • Single subject: 128 EEG 275 MEG 3T fMRI (with nulls) 1mm3 sMRI • Two sessions • ~160 face trials and ~160 scrambled trials per session • (N=12 subjects soon, as in Henson et al, 2009 a, b, c) Chapter 33, SPM8 Manual

2. Where is an effect in time-frequency (2D)? • Single MEG channel • Mean over trials of Morlet Wavelet projection (i.e, induced + evoked) • Write as t x f x 1 image per trial • SPM, correct on extent / height Faces > Scrambled Faces Scrambled Kilner Et Al (2005) Neurosci. Letters Chapter 33, SPM8 Manual

3. Where is an effect in scalp-time space (3D)? • 2D sensor positions specified or projected from 3D digitised positions • Each sample projected to a 32x32 grid using linear interpolation • Samples tiled to created a 3D volume t Chapter 33, SPM8 Manual y x • F-test of means of ~150 EEG trials of each type (since polarity not of interest) • (Note that clusters depend on reference)

3. Where is an effect in scalp-time space (3D)? More sophisticated 1st-level design matrices, e.g, to remove trial-by-trial confounds within each subject, and create mean adjusted ERP for 2nd–level analysis across subjects Each trial-type (6) Confounds (4) Across-subjects (2nd-level) Each trial Within-subject (1st-level) beta_00* images reflect mean (adjusted) 3D scalp-time volume for each condition Henson Et Al (2008) Neuroimage

4. Where is an effect in scalp-time space (3D)? Mean ERP/ERF images can also be tested between-subjects. Note however for MEG, some alignment of sensors may be necessary (e.g, SSS, Taulu et al, 2005) Without transformation to Device Space With transformation to Device Space Stats over 18 subjects on RMS of 102 planar gradiometers Taylor & Henson (2008) Biomag

Overview • Sensor Space: • Random Field Theory (RFT) • 2D Time-Freq (within-subject) • 3D Scalp-Time (within-subject) • 3D Scalp-Time (between-subjects) • Source Space: • 3D contrast images • SPM vs SnPM vs PPM (vs FDR) • Other issues & Future directions • Multivariate

Where is an effect in source space (3D)? Source analysis of N=12 subjects; 102 magnetometers; MSP; evoked; RMS; smooth 12mm 1. Estimate evoked/induced energy (RMS) at each dipole for a certain time-frequency contrast (e.g, from sensor stats, e.g 0-20Hz, 150-200ms), for each condition (e.g, faces & scrambled) and subject 2. Smooth along the 2D surface 3. Write these data into a 3D image in MNI space (if canonical / template mesh used) 4. Smooth by 8-12mm in 3D (to allow for normalisation errors) Analysis Mask Note sparseness of MSP inversions…. Henson Et Al (2007) Neuroimage

Where is an effect in source space (3D)? Source analysis of N=12 subjects; 102 magnetometers; MSP; evoked; RMS; smooth 12mm 1. Classical SPM approach Caveats: • Inverse operator induces long-range error correlations (e.g, similar gain vectors from non-adjacent dipoles with similar orientation), making RFT conservative • Need a cortical mask, else activity “smoothed” outside • Distributions over subjects may not be Gaussian… p<.05 FWE SPM

Where is an effect in source space (3D)? Source analysis of N=12 subjects; 102 magnetometers; MSP; evoked; RMS; smooth 12mm 2. Nonparametric, SnPM • Robust to non-Gaussian distributions • Less conservative than RFT when dfs<20 Caveats: • No idea of effect size (e.g, for future experiments) • Exchangeability difficult for more complex designs SnPM p<.05 FWE

Where is an effect in source space (3D)? Source analysis of N=12 subjects; 102 magnetometers; MSP; evoked; RMS; smooth 12mm 3. PPMs • No need for RFT (no MCP!) • Threshold on posterior probability of an effect (greater than some size) • Can show effect size after thresholding… Caveats: • Assume normal distributions (e.g, of mean over voxels); sometimes not met for MSP (though usually fine for IID) PPM p>.95 (γ>1SD) Grayscale= Effect Size

Where is an effect in source space (3D)? Source analysis of N=12 subjects; 102 magnetometers; MSP; evoked; RMS; smooth 12mm 4. FDR? • Topological issues…? p<.05 FWE SPM

Where is an effect in source space (3D)? Some further thoughts: • Since data live in sensor space, why not perform stats there, and just report some mean localisation (e.g, across subjects)? True but: What if sensor data not aligned (e.g, MEG)? (Taylor & Henson, 2008)? What if want to fuse modalities (e.g, MEG+EEG) (Henson et al, 2009)? What if want to use source priors (e.g, fMRI) (Henson et al, submitted)? • Contrast localisations of conditions, or localise contrast of conditions? “DoL” or “LoD” (Henson et al, 2007, Neuroimage) LoD has higher SNR (though difference only lives in trial-average, i.e evoked)? But how test localised energy of a difference (versus baseline?) Construct inverse operator (MAP) from a difference, but then apply that operator to individual conditions (Taylor & Henson, in prep)

Future Directions • Extend RFT to 2D cortical surfaces (“surfstat”) • Go multivariate… • To localise (linear combinations) of spatial (sensor or source) effects in time, using Hotelling-T2 and RFT • To detect spatiotemporal patterns in 3D images(MLM / PLS) Pantazis Et Al (2005) NeuroImage Carbonell Et Al (2004) NeuroImage Duzel Et Al (2003) Neuroimage Kherif Et Al (2004) NeuroImage

Multivariate Model (MM) toolbox Multivariate Linear Model (MLM) across subjects on MEG Scalp-Time volumes (now with 3 conditions) Famous Novel Scrambled Famous Novel Scrambled Sensitive (and suggestive of spatiotemporal dynamic networks), but “imprecise” X “M170”? Kherif Et Al (2004) NeuroImage

The End

2. Where is an effect in time-frequency (2D)? Kilner Et Al (2005) Neurosci. Letters

2. Parametric Empirical Bayes (PEB) • Weighted Minimum Norm & Bayesian equivalent • EM estimation of hyperparameters (regularisation) • Model evidence and Model Comparison • Spatiotemporal factorisation and Induced Power • Automatic Relevance Detection (hyperpriors) • Multiple Sparse Priors • MEG and EEG fusion (simultaneous inversion)

(Tikhonov) ||Y – LJ||2 “L-curve” method • = regularisation (hyperparameter) ||WJ||2 Weighted Minimum Norm, Regularisation Linear system to be inverted: Y = Data, n sensors x t=1time-samples J = Sources, p sources x ttime-samples L = Forward model, n sensors x p sources E = Multivariate Gaussian noise, n x t Ce= error covariance over sensors Since n<p, need to regularise, eg “weighted minimum (L2) norm” (WMN): W = Weighting matrix W = I minimum norm W = DDT coherent W = diag(LTL)-1 depth-weighted Wp = (LpTCy-1Lp)-1 SAM W = … …. Phillips Et Al (2002) Neuroimage, 17, 287–301

Equivalent Bayesian Formulation Equivalent “Parametric Empirical Bayes” formulation: Y = Data, n sensors x t=1time-samples J = Sources, p sources x ttime-samples L = Forward model, n sensors x p sources C(e) = covariance over sensors C(j) =covariance over sources Posterior is product of likelihood and prior: W = Weighting matrix W = I minimum norm W = DDT coherent W = diag(LTL)-1 depth-weighted Wp = (LpTCy-1Lp)-1 SAM W = … …. Maximal A Posteriori (MAP) estimate is: (Contrasting with Tikhonov): Phillips Et Al (2005) Neuroimage, 997-1011

“IID” constraint on sensors (Q(e)=I(n)) # sensors # sensors Sparse priors on sources (Q1(j), Q2(j), …) “IID” constraint on sources (Q(j)=I(p)) # sources # sources … # sources # sources Covariance Constraints (Priors) How parameterise C(e) and C(j)? Q = (co)variance components (Priors) λ= estimated hyperparameters

Expectation-Maximisation (EM) How estimate λ? …. Use EM algorithm: …to maximise the (negative) “free energy” (F): (Note estimation in nxn sensor space) Once estimated hyperparameters (iterated M-steps), get MAP for parameters (single E-step): (Can also estimate conditional covariance of parameters, allowing inference:) Phillips et al (2005) Neuroimage

Qs 500 simulations 500 simulations Qs Qs,Qi Qs,Qv Qs,Qi,Qv Qv Qi Multiple Constraints (Priors) Multiple constraints: Smooth sources (Qs), plus valid (Qv) or invalid (Qi) focal prior Mattout Et Al (2006) Neuroimage, 753-767

Model Evidence A (generative) model, M, is defined by the set of {Q(e), Q(j), L}: The “model log-evidence” is bounded by the free energy: Friston Et Al (2007) Neuroimage, 34, 220-34 (F can also be viewed the difference of an “accuracy” term and a “complexity” term): Two models can be compared using the “Bayes factor”: Also useful when comparing different forward models, ie L’s, Henson et al (submitted-b)

Qs Qv Qi Model Comparison (Bayes Factors) Multiple constraints: Smooth sources (Qs), plus valid (Qv) or invalid (Qi) focal prior Mattout Et Al (2006) Neuroimage, 753-767

~ Y = vectorised data, nt x 1 C(e) = spatial error covariance over sensors V(e)= temporal error covariance over sensors C(j) = spatial error covariance over sources V(j)= temporal error covariance over sources Temporal Correlations To handle temporally-extended solutions, first assume temporal-spatial factorisation: In general, temporal correlation of signal (sources) and noise (sensors) will differ, but can project onto a temporal subspace (via S) such that: Friston Et Al (2006) Human Brain Mapping, 27:722–735 V typically Gaussian autocorrelations… S typically an SVD into Nr temporal modes… Then turns out that EM can simply operate on prewhitened data (covariance), where Y size n x t:

Localising Power (eg induced) Friston Et Al (2006) Human Brain Mapping, 27:722–735

Anti-Averaging Prestim Baseline Sensor-level Depth-Weighting Smoothness Source-level Automatic Relevance Detection (ARD) When have many constraints (Q’s), pairwise model comparison becomes arduous Moreover, when Q’s are correlated, F-maximisation can be difficult (eg local maxima), and hyperparameters can become negative (improper for covariances) Note: Even though Qs may be uncorrelated in source space, they can become correlated when projected through L to sensor space (where F is optimised) Henson Et Al (2007) Neuroimage, 38, 422-38

Uninformative priors are then “turned-off” as (“ARD”) Automatic Relevance Detection (ARD) When have many constraints (Q’s), pairwise model comparison becomes arduous Moreover, when Q’s are correlated, F-maximisation can be difficult (eg local maxima), and hyperparameters can become negative (improper for covariances) To overcome this, one can: 1) impose positivity constraint on hyperparameters: 2) impose (sparse) hyperpriors on the (log-normal) hyperparameters: Complexity (…where ηand Σλare the posterior mean and covariance of hyperparameters)

Anti-Averaging Prestim Baseline Sensor-level Depth-Weighting Smoothness Source-level Automatic Relevance Detection (ARD) When have many constraints (Q’s), pairwise model comparison becomes arduous Moreover, when Q’s are correlated, F-maximisation can be difficult (eg local maxima), and hyperparameters can become negative (improper for covariances) Henson Et Al (2007) Neuroimage, 38, 422-38

Q(2)1 Left patch Right patch Bilateral patches Q(2)N … … … Q(2)j Q(2)j+1 Q(2)j+2 Multiple Sparse Priors (MSP) So why not use ARD to select from a large number of sparse source priors….!? … … Friston Et Al (2008) Neuroimage

Multiple Sparse Priors (MSP) So why not use ARD to select from a large number of sparse source priors….! Friston Et Al (2008) Neuroimage No depth bias!

Fusion of MEG/EEG Separate Error Covariance components for each of i=1..M modalities (Ci(e)): Data and leadfields scaled (with mispatial modes): Remember, EM returns conditional precisions (Σ) of sources (J), which can be used to compare separate vs fused inversions… Henson Et Al (2009b) Neuroimage

Fusion of MEG/EEG Magnetometers (MEG) Gradiometers (MEG) Electrodes (EEG) + Fused… Henson Et Al (2009b) Neuroimage

Overview • Random Field Theory for Space-Time images • Empirical Bayesian approach to the Inverse Problem • A Canonical Cortical mesh and Group Analyses • [ Dynamic Causal Modelling (DCM) ]

3. Canonical Mesh & Group Analyses • A “canonical” (Inverse-normalised) cortical mesh • Group analyses in 3D • Use of fMRI spatial priors (in MNI space) • Group-based inversions

Original MRI Normalised MRI Spatial Normalisation Template MRI (in “MNI” space) Warps… A “Canonical” Cortical Mesh Given the difficulty in (automatically) creating accurate cortical meshes from MRIs, how about inverse-normalising a (quality) template mesh in MNI space? Ashburner & Friston (2005) Neuroimage

Individual Canonical Template “Canonical” Individual Canonical Template A “Canonical” Cortical Mesh N=1 Apply inverse of warps from spatial normalisation of whole MRI to a template cortical mesh… Mattout Et Al (2007) Comp. Intelligence & Neuroscience

CanInd Canonical Cortex Individual Skull Individual Scalp Free Energy/104 A “Canonical” Cortical Mesh But warps from cortex not appropriate to skull/scalp, so use individually (and easily) defined skull/scalp meshes… N=9 • Statistical tests of model evidence over N=9 MEG subjects show: • MSP > MMN • BEMs > Spheres (for CanInd) • (7000 > 3000 dipoles) • (Normal > Free for MSP) Henson Et Al (2009a) Neuroimage

3. Canonical Mesh & Group Analyses • A “canonical” (Inverse-normalised) cortical mesh • Group analyses in 3D • Use of fMRI spatial priors (in MNI space) • Group-based inversions

Sensor & Source Space Statistics Rik Henson (MRC CBU, Cambridge)