Introduction to classifiers for multivariate decoding of fMRI data

Introduction to classifiers for multivariate decoding of fMRI data Evelyn Eger MMN 15/12/08

Two directions of inference 1) Forward modelling: Psychological variable (p-value) Data 2) Decoding: (prediction accuracy) Psychological variable Data

Two directions of inference • Inverse inference (decoding) is of special interest e.g., for brain – computer interface, automated diagnosis, etc. • In other cases the two are in principle interchangeable, both demonstrate a statistical dependency between experimental variable and data • In many paradigms applying decoding to fMRI, the direction of inference is not central for the interpretation (eg., Haynes & Rees, 2006, Kriegeskorte & Bandettini, 2007 for reviews) • Efficient, powerful methods based on decoding exist for pattern-based (multivariate) applications

Univariate versus multivariate • Univariate analysis: • effects are analysed for a single dependent variable • e.g., t-test, F-test, ANOVA • Special case: „mass-univariate“ analysis in brain imaging: we test effects in a large number of voxels treated as independent • Multivariate analysis: • Effects are analysed for multiple dependent variables • e.g., Hotelling´s t-square test, Wilks Lambda, MANOVA

Why go multivariate in brain imaging Stimulus conditions: 1 2 Adapted from Haynes et al. 2006 • Discrimination can be improved with higher dimensions • Significance of individual voxels not required

Linear classification (in 2D space) w b Set of points xi with labels yiЄ{1,-1} separated by a hyperplane y = wTx + b so that yi(wxi + b) > 1 For dimensions N Hyperplane N-1 Voxel 2 Voxel 1

Linear classification (in 2D space) • New data projected onto previously learned hyperplane • Assignment to classes • yiЄ{1,-1} • prediction accuracy • Which hyperplane to choose ? Voxel 2 Voxel 1

Difference between means w m2-m1 Corresponding to a classifier based on Euclidean distance / correlation m2 m1

Examples difference between means • used to demonstrate distinct multi-voxel activity patterns for object categories in ventral visual cortex(Haxby et al., 2001) • and for other recent studies on object representation, e.g. position tolerance (Schwarzlose et al., 2008), perceived shape similarity (Op de Beeck et al., 2008) From Haxby et al., 2001

Difference between means w m2-m1 Corresponding to a classifier based on Euclidean distance / correlation not taking into account variances/covariances m2 m1

Fishers linear discriminant w S-1(m2-m1) S – covariance matrix Distance measure: Mahalanobis distance m2 m1

Examples Fishers linear discriminant • Decoding of conscious and unconscious stimulus orientation from early visual cortex activity (Haynes & Rees, 2005) • Discrimination of individual faces in anterior inferotemporal cortex(Kriegeskorte et al., 2007) From Haynes & Rees, 2006 review From Kriegeskorte et al, 2007

Fishers linear discriminant w S-1(m2-m1) S – covariance matrix Distance measure: Mahalanobis distance Curse of dimensionality: S is not invertible when dimensionality exeeds number of data points m2 m1

Support vector machines w : weighted linear combination of support vectors minimising ||w||/2 subject to yi(wxi + b) > 1, i = 1 : N “hard-margin” classifier Support Vector Support Vector Support Vector

Support vector machines Support Vector ξ Support Vector ξ Support Vector w : weighted linear combination of support vectors minimising ||w||/2 + C∑ξi subject to yi(wxi + b) ≥1 – ξi, i = 1 : N, ξ >0 C – regularisation parameter (trade-off largest margin versus fewest misclassi-fications) “soft-margin” classifier

Examples SVM • Decoding of attented orientation and motion direction from early visual cortex activity (Kamitani & Tong, 2005, 2006) From Kamitani & Tong, 2005

Support vector machines Support Vector Support Vector Support Vector Support Vector • Use of non-linear kernel functions • Potential of overfitting, especially when few training examples available • Hardly used in fMRI Non-linear classifier

Comparison of classifier performance From Cox & Savoy, 2003

Analysis work flow 1) ROI definition 2) Data extraction Condition 1 Condition 2 ... 3) Training Pattern classifier 4) Test Object discrimination Size generalisation (same size) (1 step)

ROI definition – voxel selection • Regions of interest have to be defined by orthogonal contrast (e.g., in object exemplar discrimination experiment, LOC localiser session, all stimuli vs baseline etc.) • if a further voxel-selection is performed based on the contrast of interest, this has to be on training data only to avoid bias • also other criteria for voxel selection (e.g., „reproducibility“ of voxelwise response to different conditions in separate sessions, Grill-Spector et al., 2006, Nat Neurosci) can be biased

Data extraction Which data to use for classification? • No general rule, different studies used beta images or raw EPI images • ideally as many images as possible for optimal classification performance • in typical neuroimaging studies, there is a tradeoff between number of images and their individual signal-to-noise ratio • fewer, but less noisy images are sometimes preferable (when using SVM)

Crossvalidation (Training – test) Classifier performance always has to be tested on independent data • Split-half crossvalidation (often used in studies employing correlation) – one half of data for training, the other for test • Leave-one-out crossvalidation (common with other classifiers), e.g. all but one sessions for training, remaining session for test

Leave-one-out Crossvalidation Condition 1 … Condition 2 … Block 1 : N … training test SVM pattern classifier (all but one patterns / condition) ? Leave one out with N-fold cross-validation …

Crossvalidation (Training – test) Classifier performance always has to be tested on independent data • Split-half crossvalidation (often used in studies employing correlation) – one half of data for training, the other for test • Leave-one-out crossvalidation (common with other classifiers), e.g. all but one sessions for training, remaining session for test • Importantly, „leave-one-out“ should mean leave one image of each condition out (all of one session) – avoid biases due to session effects and unequal prior probabilities (with SVM)

Implementations General SVM implementations exist in different languages: Matlab: SVM toolbox (University of Southampton,UK) http://www.isis.ecs.soton.ac.uk/resources/svminfo SVM toolbox (TU Graz, Austria) http://ida.first.fraunhofer.de/~anton/software.html C: SVM-light http://svmlight.joachims.org Python or R Multi - Voxel Pattern Analysis (MVPA) toolbox for fMRI data developed at Princeton University (beta version - matlab, python) http://www.csbmb.princeton.edu/mvpa

Appendix: Distance measures Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2, ..., xm, the various distances between the vector xr and xs are defined as: Euclidean distance: Drs2 = (xr-xs)(xr-xs)´ Standardised Euclidean distance: Drs2 = (xr-xs)D-1(xr-xs)´ D - diagonal matrix with diagonal elements given by the variance of the variable Xi over them objects Mahalanobis distance: Drs2 = (xr-xs)S-1(xr-xs)´ S - sample covariance matrix

Introduction to classifiers for multivariate decoding of fMRI data

Introduction to classifiers for multivariate decoding of fMRI data

Presentation Transcript

Introduction to fMRI

MVPD – Multivariate pattern decoding

Multivariate analyses and decoding

Multivariate models for fMRI data

fMRI introduction

Evaluating which classifiers work best for decoding neural data

Learning fMRI-Based Classifiers for Cognitive States

Introduction to Multivariate Analysis and Multivariate Distances

Introduction to fMRI for Social Scientists

Introduction to multivariate analysis

Generalized Sparse Classifiers for Decoding Cognitive States in fMRI

Introduction to Multivariate Analysis

1. Introduction to fMRI

Multivariate Approaches to Analyze fMRI Data Yuanxin Hu

Introduction to multivariate QTL

MVPA / MVPD – Multivariate pattern decoding

Multivariate data

Introduction to fMRI

Data Analysis for fMRI

Introduction to Multivariate Analysis

Multivariate Approaches to Analyze fMRI Data Yuanxin Hu

Introduction to multivariate QTL