The General Linear Model

The General Linear Model A Basic Introduction Roger Tait (rt337@cam.ac.uk)

Overview • What is imaging data • How is data pre-processed • Hypothesis testing • GLM: simple linear regression • Analysis software • How to process results

What is imaging data?

Data A stack of numbers Structural fMRI Functional

Multiple Data

Reorientation Native Reoriented MNI152

Basic pre-processing (fmri) worest.nii obrain.nii omprage.nii omrest.nii wnomrest.nii nomrest.nii

Basic pre-processing (structural) wgmomprage.nii gmomprage.nii omprage.nii

How does standard space data help?

5% Parametric Null Distribution Hypothesis testing Statistical inference is commonly done with a test statistic (t, F, c2…) which has a distribution under H0 mathematically derived. For example ^ ^ b1 – b0 t = ^ ^ SE(b1 – b0) t NB: this assumes that the errors are independent and normally distributed.

Introducing The GLM Y = Xb + e • Encapsulates: t-test (paired, un-paired), F-test, ANOVA (one-way, two-way, main effects, factorial) MANOVA, ANCOVA, MANCOVA, simple regression, linear regression, multiple regression, multivariate regression…… DATA = MODEL + ERROR DATA = KNOWN * UNKNOWN + ERROR

GLM definition Y = Xb + e • Where Y is a matrix with a series of observed measurements • Where X is a matrix that might be a design matrix • Where b is a matrix containing parameters to be estimated • And e is a matrix containing error or noise

GLM: Simple Linear Regression Y = b0 + X1b1 + e b0: is the Y axis intercept Y b1: is the gradient of slope Y: the black circles e: diff between predicted Y and observed Y X

GLM: Simple Linear Regression ^ Y = b0 + X1b1 + e ^ • This is done by choosing b0 and b1 so that the sum of the squares of the estimated errors S ei2 is as small as possible. • This is called the Method of Least Squares. • S ei2 is called the Residual Sum of Squares (RSS)

GLM example DATA = KNOWN * UNKNOWN + ERROR = mean reaction time + GENDER + AGE Y = b0 + X1b1+ X2b2+ X3b3+ X4b4+ e

Dummy Variables • Continuous variables • measurements on a continuous scale (age, mRT) (-4.01, -0.47, 6.35, -7.06, -7.69, -14.24) • Dummy Variables • Code for group membership (disease, gender) controls = 0, patients = 1 females = 1, males = -1

Usage • Hypothesis tests with GLM can be multivariate or several independent univariate tests • In multivariate tests the columns of Y are tested together • In univariate tests the columns of Y are tested independently (multiple univariate tests with the same design matrix)

fMRI model specification silent naming task The model BOLD signal

Actual retrieved data

fmri analysis with FSL

Structural analysis with CamBA sex weight group

Structural analysis output

Where are my clusters? here is a big cluster here is a big cluster

Where is the cluster I am interested in? position mouse cursor here cluster location information shown here

How do my clusters help me?

Statistical Testing • Convert cluster into a binary mask • Overlay mask on subject data • Extract voxel intensities • Do some statistical analysis to get more information from your data

Correlation with behaviour for cluster Pos_002 p>0.05 close but cluster Pos_001 does not significantly correlate with behaviour HIT1

Other Analyses different from 0 one-sample t-test Difference between means two-sample t-test Linear relationship between 2 variables simple regression

What else can I do to find out more about my data?

Other types of analyses • Factorial designs • Permits analysis of multiple time data • Shows • Main effects of Factor 1 (time) • Main effects of Factor 2 (group) • Interaction between Factor 1 and Factor 2

Useful software package • CamBA – Cambridge • http://www-bmu.psychiatry.cam.ac.uk/software/ • FSL Randomise – Oxford • http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Randomise • SPM8 – UCL • http://www.fil.ion.ucl.ac.uk/spm/software/spm8/

In summary • The GLM allows us to summarize a wide variety of research outcomes by specifying the exact equation that best summarizes the data for a study. If the model is wrongly specified, the estimates of the coefficients (the beta values) are likely to be biased (i.e. wrong) and the resulting equation will not describe the data accurately. • In complex situations (e.g. cognitive fMRI paradigms), this model specification problem can be a serious and difficult one

Any questions?

The General Linear Model