300 likes | 447 Views
Introduction to classifiers for multivariate decoding of fMRI data. Evelyn Eger. MMN 15/12/08. Two directions of inference. 1) Forward modelling:. Psychological variable. (p-value). Data. 2) Decoding:. (prediction accuracy). Psychological variable. Data. Two directions of inference.
E N D
Introduction to classifiers for multivariate decoding of fMRI data Evelyn Eger MMN 15/12/08
Two directions of inference 1) Forward modelling: Psychological variable (p-value) Data 2) Decoding: (prediction accuracy) Psychological variable Data
Two directions of inference • Inverse inference (decoding) is of special interest e.g., for brain – computer interface, automated diagnosis, etc. • In other cases the two are in principle interchangeable, both demonstrate a statistical dependency between experimental variable and data • In many paradigms applying decoding to fMRI, the direction of inference is not central for the interpretation (eg., Haynes & Rees, 2006, Kriegeskorte & Bandettini, 2007 for reviews) • Efficient, powerful methods based on decoding exist for pattern-based (multivariate) applications
Univariate versus multivariate • Univariate analysis: • effects are analysed for a single dependent variable • e.g., t-test, F-test, ANOVA • Special case: „mass-univariate“ analysis in brain imaging: we test effects in a large number of voxels treated as independent • Multivariate analysis: • Effects are analysed for multiple dependent variables • e.g., Hotelling´s t-square test, Wilks Lambda, MANOVA
Why go multivariate in brain imaging Stimulus conditions: 1 2 Adapted from Haynes et al. 2006 • Discrimination can be improved with higher dimensions • Significance of individual voxels not required
Linear classification (in 2D space) w b Set of points xi with labels yiЄ{1,-1} separated by a hyperplane y = wTx + b so that yi(wxi + b) > 1 For dimensions N Hyperplane N-1 Voxel 2 Voxel 1
Linear classification (in 2D space) • New data projected onto previously learned hyperplane • Assignment to classes • yiЄ{1,-1} • prediction accuracy • Which hyperplane to choose ? Voxel 2 Voxel 1
Difference between means w m2-m1 Corresponding to a classifier based on Euclidean distance / correlation m2 m1
Examples difference between means • used to demonstrate distinct multi-voxel activity patterns for object categories in ventral visual cortex(Haxby et al., 2001) • and for other recent studies on object representation, e.g. position tolerance (Schwarzlose et al., 2008), perceived shape similarity (Op de Beeck et al., 2008) From Haxby et al., 2001
Difference between means w m2-m1 Corresponding to a classifier based on Euclidean distance / correlation not taking into account variances/covariances m2 m1
Fishers linear discriminant w S-1(m2-m1) S – covariance matrix Distance measure: Mahalanobis distance m2 m1
Examples Fishers linear discriminant • Decoding of conscious and unconscious stimulus orientation from early visual cortex activity (Haynes & Rees, 2005) • Discrimination of individual faces in anterior inferotemporal cortex(Kriegeskorte et al., 2007) From Haynes & Rees, 2006 review From Kriegeskorte et al, 2007
Fishers linear discriminant w S-1(m2-m1) S – covariance matrix Distance measure: Mahalanobis distance Curse of dimensionality: S is not invertible when dimensionality exeeds number of data points m2 m1
Support vector machines w : weighted linear combination of support vectors minimising ||w||/2 subject to yi(wxi + b) > 1, i = 1 : N “hard-margin” classifier Support Vector Support Vector Support Vector
Support vector machines Support Vector ξ Support Vector ξ Support Vector w : weighted linear combination of support vectors minimising ||w||/2 + C∑ξi subject to yi(wxi + b) ≥1 – ξi, i = 1 : N, ξ >0 C – regularisation parameter (trade-off largest margin versus fewest misclassi-fications) “soft-margin” classifier
Examples SVM • Decoding of attented orientation and motion direction from early visual cortex activity (Kamitani & Tong, 2005, 2006) From Kamitani & Tong, 2005
Support vector machines Support Vector Support Vector Support Vector Support Vector • Use of non-linear kernel functions • Potential of overfitting, especially when few training examples available • Hardly used in fMRI Non-linear classifier
Comparison of classifier performance From Cox & Savoy, 2003
Analysis work flow 1) ROI definition 2) Data extraction Condition 1 Condition 2 ... 3) Training Pattern classifier 4) Test Object discrimination Size generalisation (same size) (1 step)
Analysis work flow 1) ROI definition 2) Data extraction Condition 1 Condition 2 ... 3) Training Pattern classifier 4) Test Object discrimination Size generalisation (same size) (1 step)
ROI definition – voxel selection • Regions of interest have to be defined by orthogonal contrast (e.g., in object exemplar discrimination experiment, LOC localiser session, all stimuli vs baseline etc.) • if a further voxel-selection is performed based on the contrast of interest, this has to be on training data only to avoid bias • also other criteria for voxel selection (e.g., „reproducibility“ of voxelwise response to different conditions in separate sessions, Grill-Spector et al., 2006, Nat Neurosci) can be biased
Analysis work flow 1) ROI definition 2) Data extraction Condition 1 Condition 2 ... 3) Training Pattern classifier 4) Test Object discrimination Size generalisation (same size) (1 step)
Data extraction Which data to use for classification? • No general rule, different studies used beta images or raw EPI images • ideally as many images as possible for optimal classification performance • in typical neuroimaging studies, there is a tradeoff between number of images and their individual signal-to-noise ratio • fewer, but less noisy images are sometimes preferable (when using SVM)
Analysis work flow 1) ROI definition 2) Data extraction Condition 1 Condition 2 ... 3) Training Pattern classifier 4) Test Object discrimination Size generalisation (same size) (1 step)
Crossvalidation (Training – test) Classifier performance always has to be tested on independent data • Split-half crossvalidation (often used in studies employing correlation) – one half of data for training, the other for test • Leave-one-out crossvalidation (common with other classifiers), e.g. all but one sessions for training, remaining session for test
Leave-one-out Crossvalidation Condition 1 … Condition 2 … Block 1 : N … training test SVM pattern classifier (all but one patterns / condition) ? Leave one out with N-fold cross-validation …
Leave-one-out Crossvalidation Condition 1 … Condition 2 … Block 1 : N … training test SVM pattern classifier (all but one patterns / condition) ? Leave one out with N-fold cross-validation …
Crossvalidation (Training – test) Classifier performance always has to be tested on independent data • Split-half crossvalidation (often used in studies employing correlation) – one half of data for training, the other for test • Leave-one-out crossvalidation (common with other classifiers), e.g. all but one sessions for training, remaining session for test • Importantly, „leave-one-out“ should mean leave one image of each condition out (all of one session) – avoid biases due to session effects and unequal prior probabilities (with SVM)
Implementations General SVM implementations exist in different languages: Matlab: SVM toolbox (University of Southampton,UK) http://www.isis.ecs.soton.ac.uk/resources/svminfo SVM toolbox (TU Graz, Austria) http://ida.first.fraunhofer.de/~anton/software.html C: SVM-light http://svmlight.joachims.org Python or R Multi - Voxel Pattern Analysis (MVPA) toolbox for fMRI data developed at Princeton University (beta version - matlab, python) http://www.csbmb.princeton.edu/mvpa
Appendix: Distance measures Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2, ..., xm, the various distances between the vector xr and xs are defined as: Euclidean distance: Drs2 = (xr-xs)(xr-xs)´ Standardised Euclidean distance: Drs2 = (xr-xs)D-1(xr-xs)´ D - diagonal matrix with diagonal elements given by the variance of the variable Xi over them objects Mahalanobis distance: Drs2 = (xr-xs)S-1(xr-xs)´ S - sample covariance matrix