1 / 30

Introduction to classifiers for multivariate decoding of fMRI data

Introduction to classifiers for multivariate decoding of fMRI data. Evelyn Eger. MMN 15/12/08. Two directions of inference. 1) Forward modelling:. Psychological variable. (p-value). Data. 2) Decoding:. (prediction accuracy). Psychological variable. Data. Two directions of inference.

olin
Download Presentation

Introduction to classifiers for multivariate decoding of fMRI data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to classifiers for multivariate decoding of fMRI data Evelyn Eger MMN 15/12/08

  2. Two directions of inference 1) Forward modelling: Psychological variable (p-value) Data 2) Decoding: (prediction accuracy) Psychological variable Data

  3. Two directions of inference • Inverse inference (decoding) is of special interest e.g., for brain – computer interface, automated diagnosis, etc. • In other cases the two are in principle interchangeable, both demonstrate a statistical dependency between experimental variable and data • In many paradigms applying decoding to fMRI, the direction of inference is not central for the interpretation (eg., Haynes & Rees, 2006, Kriegeskorte & Bandettini, 2007 for reviews) • Efficient, powerful methods based on decoding exist for pattern-based (multivariate) applications

  4. Univariate versus multivariate • Univariate analysis: • effects are analysed for a single dependent variable • e.g., t-test, F-test, ANOVA • Special case: „mass-univariate“ analysis in brain imaging: we test effects in a large number of voxels treated as independent • Multivariate analysis: • Effects are analysed for multiple dependent variables • e.g., Hotelling´s t-square test, Wilks Lambda, MANOVA

  5. Why go multivariate in brain imaging Stimulus conditions: 1 2 Adapted from Haynes et al. 2006 • Discrimination can be improved with higher dimensions • Significance of individual voxels not required

  6. Linear classification (in 2D space) w b Set of points xi with labels yiЄ{1,-1} separated by a hyperplane y = wTx + b so that yi(wxi + b) > 1 For dimensions N Hyperplane N-1 Voxel 2 Voxel 1

  7. Linear classification (in 2D space) • New data projected onto previously learned hyperplane • Assignment to classes • yiЄ{1,-1} • prediction accuracy • Which hyperplane to choose ? Voxel 2 Voxel 1

  8. Difference between means w m2-m1 Corresponding to a classifier based on Euclidean distance / correlation m2 m1

  9. Examples difference between means • used to demonstrate distinct multi-voxel activity patterns for object categories in ventral visual cortex(Haxby et al., 2001) • and for other recent studies on object representation, e.g. position tolerance (Schwarzlose et al., 2008), perceived shape similarity (Op de Beeck et al., 2008) From Haxby et al., 2001

  10. Difference between means w m2-m1 Corresponding to a classifier based on Euclidean distance / correlation not taking into account variances/covariances m2 m1

  11. Fishers linear discriminant w S-1(m2-m1) S – covariance matrix Distance measure: Mahalanobis distance m2 m1

  12. Examples Fishers linear discriminant • Decoding of conscious and unconscious stimulus orientation from early visual cortex activity (Haynes & Rees, 2005) • Discrimination of individual faces in anterior inferotemporal cortex(Kriegeskorte et al., 2007) From Haynes & Rees, 2006 review From Kriegeskorte et al, 2007

  13. Fishers linear discriminant w S-1(m2-m1) S – covariance matrix Distance measure: Mahalanobis distance Curse of dimensionality: S is not invertible when dimensionality exeeds number of data points m2 m1

  14. Support vector machines w : weighted linear combination of support vectors minimising ||w||/2 subject to yi(wxi + b) > 1, i = 1 : N “hard-margin” classifier Support Vector Support Vector Support Vector

  15. Support vector machines Support Vector ξ Support Vector ξ Support Vector w : weighted linear combination of support vectors minimising ||w||/2 + C∑ξi subject to yi(wxi + b) ≥1 – ξi, i = 1 : N, ξ >0 C – regularisation parameter (trade-off largest margin versus fewest misclassi-fications) “soft-margin” classifier

  16. Examples SVM • Decoding of attented orientation and motion direction from early visual cortex activity (Kamitani & Tong, 2005, 2006) From Kamitani & Tong, 2005

  17. Support vector machines Support Vector Support Vector Support Vector Support Vector • Use of non-linear kernel functions • Potential of overfitting, especially when few training examples available • Hardly used in fMRI Non-linear classifier

  18. Comparison of classifier performance From Cox & Savoy, 2003

  19. Analysis work flow 1) ROI definition 2) Data extraction Condition 1 Condition 2 ... 3) Training Pattern classifier 4) Test Object discrimination Size generalisation (same size) (1 step)

  20. Analysis work flow 1) ROI definition 2) Data extraction Condition 1 Condition 2 ... 3) Training Pattern classifier 4) Test Object discrimination Size generalisation (same size) (1 step)

  21. ROI definition – voxel selection • Regions of interest have to be defined by orthogonal contrast (e.g., in object exemplar discrimination experiment, LOC localiser session, all stimuli vs baseline etc.) • if a further voxel-selection is performed based on the contrast of interest, this has to be on training data only to avoid bias • also other criteria for voxel selection (e.g., „reproducibility“ of voxelwise response to different conditions in separate sessions, Grill-Spector et al., 2006, Nat Neurosci) can be biased

  22. Analysis work flow 1) ROI definition 2) Data extraction Condition 1 Condition 2 ... 3) Training Pattern classifier 4) Test Object discrimination Size generalisation (same size) (1 step)

  23. Data extraction Which data to use for classification? • No general rule, different studies used beta images or raw EPI images • ideally as many images as possible for optimal classification performance • in typical neuroimaging studies, there is a tradeoff between number of images and their individual signal-to-noise ratio • fewer, but less noisy images are sometimes preferable (when using SVM)

  24. Analysis work flow 1) ROI definition 2) Data extraction Condition 1 Condition 2 ... 3) Training Pattern classifier 4) Test Object discrimination Size generalisation (same size) (1 step)

  25. Crossvalidation (Training – test) Classifier performance always has to be tested on independent data • Split-half crossvalidation (often used in studies employing correlation) – one half of data for training, the other for test • Leave-one-out crossvalidation (common with other classifiers), e.g. all but one sessions for training, remaining session for test

  26. Leave-one-out Crossvalidation Condition 1 … Condition 2 … Block 1 : N … training test SVM pattern classifier (all but one patterns / condition) ? Leave one out with N-fold cross-validation …

  27. Leave-one-out Crossvalidation Condition 1 … Condition 2 … Block 1 : N … training test SVM pattern classifier (all but one patterns / condition) ? Leave one out with N-fold cross-validation …

  28. Crossvalidation (Training – test) Classifier performance always has to be tested on independent data • Split-half crossvalidation (often used in studies employing correlation) – one half of data for training, the other for test • Leave-one-out crossvalidation (common with other classifiers), e.g. all but one sessions for training, remaining session for test • Importantly, „leave-one-out“ should mean leave one image of each condition out (all of one session) – avoid biases due to session effects and unequal prior probabilities (with SVM)

  29. Implementations General SVM implementations exist in different languages: Matlab: SVM toolbox (University of Southampton,UK) http://www.isis.ecs.soton.ac.uk/resources/svminfo SVM toolbox (TU Graz, Austria) http://ida.first.fraunhofer.de/~anton/software.html C: SVM-light http://svmlight.joachims.org Python or R Multi - Voxel Pattern Analysis (MVPA) toolbox for fMRI data developed at Princeton University (beta version - matlab, python) http://www.csbmb.princeton.edu/mvpa

  30. Appendix: Distance measures Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2, ..., xm, the various distances between the vector xr and xs are defined as: Euclidean distance: Drs2 = (xr-xs)(xr-xs)´ Standardised Euclidean distance: Drs2 = (xr-xs)D-1(xr-xs)´ D - diagonal matrix with diagonal elements given by the variance of the variable Xi over them objects Mahalanobis distance: Drs2 = (xr-xs)S-1(xr-xs)´ S - sample covariance matrix

More Related