Functional Brain Signal Processing: EEG & fMRI Lesson 2

M.Tech. (CS), Semester III, Course B50 Functional Brain Signal Processing: EEG & fMRILesson 2 Kaushik Majumdar Indian Statistical Institute Bangalore Center kmajumdar@isibang.ac.in

EEG Processing • Preprocessing • Pattern recognition

Benbadis and Rielo, 2008: http://emedicine.medscape.com/article/1140247-overview EEG Artifacts

Benbadis and Rielo, 2008: http://emedicine.medscape.com/article/1140247-overview Eye Blink Artifact: Electrooculogram (EOG)

Matrix Representation of Multi-Channel EEG • M is an m x n matrix, whose m rows represent m EEG channels and n columns represent n time points. • Often during EEG processing we are to find a matrix W such that WM is the processed signal.

Majumdar, under preparation, 2013 EOG Identification by Principal Component Analysis (PCA)

PCA Algorithm (cont.)

PCA Algorithm (cont.) PCA Rotation and (Stretching or Contracting)

Wallstrom et al., Int. J. Psychophysiol., 53: 105-119, 2004 Performance of PCA in EOG Removal EOG

Independent Component Analysis (ICA) • In PCA data components are assumed to be mutually orthogonal, which is too restrictive. PCA components Original data sets

ICA (cont.) • PCA will give poor results if the covariance matrix has eigenvalues close to each other.

ICA as Blind Source Separation (BSS) S1 S4 Four musicians are playing in a room. From the outside only music can be heard through four microphones. No one can be seen. How the music heard from outside can be decomposed into four sources? S2 S3 1 2 4 3

Mathematical Formulation A is mixing matrix, x is sensor vector, s is source vector and n is noise, which is to be eliminated by filtering.

Mathematical Formulation (cont.) Given find such that Any estimation technique of is called an ICA technique or BSS technique in general.

Hyvarinen and Oja, Neural Networks, 13: 411-430, 2000 ICA Algorithm: FastICA Whitening: • Normalization (make mean zero). • Make variance one i.e., E expectation, x is the vector of signals and I is identity matrix.

FastICA (cont.) B is orthogonal matrix and D is diagonal matrix of E will satisfy Whitening complete

Non-Gaussianity • ICA is appropriate only when probability distribution of the data set is non-Gaussian. • Gaussian distribution is of the form

Entropy of Gaussian Variable • A Gaussian variable has the largest entropy among a class of random variables with equal variance (for a proof see Cover & Thomas, Elements of Information Theory). Here we will give an intuitive argument.

Entropy of a Random Variable X More information Less (zero) information

Gaussian Random Variable Has Highest Entropy: Intuitive Proof • By Central Limit Theorem (CLT) the mean of a class of random variables (class is signified by uniform variance) follows normal distribution as the number of members in the class tends to infinity (i.e., becomes very large). • Infinite observations hold infinite or maximum amount of information.

Intuitive Proof (cont.) • Therefore a random variable with normal distribution has the highest information content. • So it has the highest entropy. If each variable in a class of random variables admit only finite number of nonzero values, the one with uniform distribution will have the highest entropy.

Non-Gaussianity as Negentropy H is entropy and J negentropy. J is to be maximized. When J is maximum y is reduced to a component. This can be shown by calculating the kurtosis for component and sum of components including the said component (See Hyvarinen & Oja, 2000, P. 7).

Steps of FastICA after Whitening g is in the form of either of the two

Exercise • FastICA has been implemented in EEGLAB (in runica function). Remove artifacts from sample EEG data using the ICA implementation in EEGLAB.

Concept of Independence in PCA and ICA • In PCA independence means orthogonality i.e., pairwise dot product is zero. • In ICA independence is statistical independence. Let x, y be random variables, p(x) is probability distribution function of x and p(x,y) is joint probability distribution function of (x,y). If p(x,y) = p(x).p(y) holds we call x and y are statistically independent.

Independence (cont.) • If vectors v1 and v2 are orthogonal they are independent. Say not, then a1v1 + a2v2 = 0 implies, a1v1.v1 + a2v2.v1 = 0 or a1 = 0. Similarly a2 = 0. • If v1 = cv2 then both of them must have same probability distribution or p(v1,v2) = p(v1) = p(v2). If v1 and v2 are linearly independent p(v1,v2) = p(v1).p(v2) may or may not hold. • If p(v1,v2) = p(v1).p(v2) holds then v1 and v2 are linearly independent.

Conditions for ICA Applicability • Sources are statistically independent. • Propagation delays in the mixing medium are negligible. Sources are time varying. Mixing medium delays may affect sources in different locations differently and thereby corrupting their temporal structures. • Number of sources = number of sensors.

References • Benbadis and Rielo, EEG artifacts, eMedicine, available online at http://emedicine.medscape.com/article/1140247-overview, 2008. • Hyvarinen and Oja, Independent component analysis: algorithms and applications, Neural Networks, vol. 13, p. 411-431, 2000. • Majumdar, A Brief Survey of Quantitative EEG Analysis, Chapter 2.

THANK YOUThis lecture is available at http://www.isibang.ac.in/~kaushik

Functional Brain Signal Processing: EEG & fMRI Lesson 2