General Linear Model

General Linear Model Lύcia Garrido and Marieke Schölvinck ICN

Observed data Time Intensity Preprocessing ... Y • Y is a matrix of BOLD signals: • Each column represents a single voxel sampled at successive time points.

Univariate analysis GLM in two steps: • Does an analysis of variance separately at each voxel (univariate) • Makes t statistic from the results of this analysis, for each voxel

Example Y X X can contain values quantifying experimental variable

Parameters & error Y = βx + c + ε • β: slope of line relating x to y • ‘how much of x is needed to approximate y?’ • ε = residual error • the best estimate of β minimises ε: deviations from line • Assumed to be independently, identically and normally distributed this line is a 'model' of the data slope β = 0.23 Interceptc = 54.5

Multiple Regression • Simple regression • Multiple regression (more than one predictor/regressor/beta) • y = β1 * x1 + β2 * x2 + c + ε

Matrix Formulation Y = X . β + ε • Write out equation for each observation of variable Y from 1 to J: Y1 = X11β1 +…+X1lβl +…+ X1LβL + ε1 Yj = Xj1β1 +…+Xjlβl +…+ XjLβL + εj YJ = XJ1β1 +…+XJlβl +…+ XJLβL + εJ Can turn these simultaneous equations into matrix form to get a single equation: β1 βj βJ ε1 εj εJ Y1 Yj YJ X11 … X1l … X1L Xj1 … X1l… X1L X11 … X1l… X1L + = Y = X x β + ε Observed data Design Matrix Parameters Residuals/Error

GLM and fMRI Y= X. β+ ε Observed data: Y is the BOLD signal at various time points at a single voxel Design matrix: Several components which explain the observed data, i.e. the BOLD time series for the voxel Parameters: Define the contribution of each component of the design matrix to the value of Y Estimated so as to minimise the error, ε, i.e. least sums of squares Error: Difference between the observed data, Y, and that predicted by the model, Xβ.

Design Matrix x1 x2 c Matrix represents values of X Different columns = different predictors

Parameter estimation e = Y – Ỹ = Y - X β S = ΣjJej2= eTe = (Y - X β )T(Y - X β ) The least square estimates are the parameter estimates which minimize the residual sum of squares • find derivative and solve for ∂S/∂β = 0 • β = (XTX)-1 XTY (if (XTX) is invertible) Matlab magic: >> B = inv(X) * Y

Statistical inference • A beta value is estimated for each column in design matrix • Test if the slope is significantly different from zero (null hypothesis) • t-statistic = beta / standard error of the slope • Many betas → contrasts (contents of another talk…) • t-tests or F-tests depending on nature of question

Continuous predictors Y X X can contain values quantifying experimental variable

Binary predictors Y X X can contain values distinguishing experimental conditions

Covariates vs. conditions • Covariates: • parametric modulation of independent variable • e.g. task-difficulty 1 to 6 • Conditions: • 'dummy' codes identify different levels of experimental factor • e.g. integers 0 or 1: 'off' or 'on' on off off on

Ways to improve your model: modelling haemodynamics HRF basic function • Brain does not just switch on and off! • Reshape (convolve) regressors to resemble HRF Original HRF Convolved

Ways to improve your model: model everything globalactivity or movement • Important to model all known variables, even if not experimentally interesting: • e.g. head movement, block and subject effects •  minimise residual error variance for better stats • effects-of-interest are the regressors you’re actually interested in conditions: effects of interest subjects

Summary • The General Linear Model allows you to find the parameters, β, which provide the best fit with your data, Y • The optimal parameters estimates, β, are found by minimising the Sums of Squares differences between your predicted model and the observed data • The design matrix in SPM contains the information about the factors, X, which may explain the observed data • Once we have obtained the βs at each voxel we can use these to do various statistical tests

Thanks to… Previous MfD talks: Elliot Freeman (2005), Davina Bristow and Beatriz Calvo (2004) http://www.fil.ion.ucl.ac.uk/spm/doc/books/hbf2/pdfs/Ch7.pdf http://www.mrc-cbu.cam.ac.uk/Imaging/Common/spmstats.shtml

Summary Y= X. β+ ε Observed data: SPM uses a mass univariate approach – that is each voxel is treated as a separate column vector of data. Y is the BOLD signal at various time points at a single voxel Parameters: Define the contribution of each component of the design matrix to the value of Y Estimated so as to minimise the error, ε, i.e. least sums of squares Error: Difference between the observed data, Y, and that predicted by the model, Xβ. Not assumed to be spherical in fMRI Design matrix: Several components which explain the observed data, i.e. the BOLD time series for the voxel Timing info: onset vectors, Omj, and duration vectors, Dmj HRF, hm, describes shape of the expected BOLD response over time Other regressors, e.g. realignment parameters

General Linear Model