Survey on ICA

Survey on ICA Technical Report, Aapo Hyvärinen, 1999. http://ww.icsi.berkeley.edu/~jagota/NCS

Outline • 2nd-order methods • PCA / factor analysis • Higher order methods • Projection pursuit / Blind deconvolution • ICA • definitions • criteria for identifiability • relations to other methods • Applications • Contrast functions • Algorithms

General model x = As + n Observations Mixing matrix Noise Latent variables, factors, independent components

Find transformation s = f (x) Consider only linear transformation: s = Wx

Principal component analysis • Find direction(s) where variance of wTx is maximized. • Equivalent to finding the eigenvectors of C=E(xxT) corresponding to the k largest eigenvalues

Principal component analysis

Factor analysis • Closely related to PCA • x = As + n • Method of principal factors: • Assumes knowledge of covariance matrix of the noise: E(nnT) • PCA on: C = E(xxT)– E(nnT) • Factors are not defined uniquely, but only up to a rotation

Higher order methods • Projection pursuit • Redundancy reduction • Blind deconvolution • Requires assumption that data are not Gaussian

Projection pursuit • Find direction w, such that wTx has an ’interesting’ distribution • Argued that interesting directions are those that show the least Gaussian distribution

Differential entropy • Maximised when f is a Gaussian density • Minimize H(wTx) to find projection pursuit directions (y = wTx) • Difficult to estimate the density of wTx

Example: projection pursuit

Blind deconvolution Observe filtered version of s(t): x(t) = s(t)*g(t) Find filter h(t), such that s(t) = h(t)*x(t)

Example blind deconvolution Seismic: ”statistical deconvolution”

Blind deconvolution (3) g(t) t s(t) t

Blind deconvolution (4)

ICA definitions Definition 1 (General definition) ICA of a random vector x consists of finding a linear transformation, s=Wx, so that the components, si, are as independent as possible, in the sense of maximizing some function F(s1,..,sm) that measure independence.

ICA definitions Definition 2 (Noisy ICA) ICA of a random vector x consists of estimating the following model for the data: x = As + n where the latent variables si are assumed independent Definition 3 (Noise-free ICA) x = As

Statistical independence • ICA requires statistical independence • Distinguish between statistically independent and uncorrelated variables • Statistically independent: • Uncorrelated:

Identifiability of ICA model • All the independent components, but one, must be non-Gaussian • The number of observed mixtures must be at least as large the number of independent components, m >= n • The matrix A must be of full column rank • Note: with m < n, A may still be indentifiable

Relations to other methods • Redundancy reduction • Noise free case • Find ’interesting’ projections • Special case of projection pursuit • Blind deconvolution • Factor analysis for non-Gaussian data • Related to non-linear PCA

Relations to other methods (2)

Applications of ICA • Blind source separation • Cocktail party problem • Feature extraction • Blind deconvolution

Blind source separation

Objective (contrast) functions ICA method = Objective function + Optimization algorithm • Multi-unit contrast functions • Find all independent components • One-unit contrast functions • Find one independent component (at a time)

Mutual information • Mutual information is zero if the yi are independent • Difficult to estimate, approximations exist

Mutual information (2) • Alternative definition

Mutual information (3) H(X|Y) H(Y|X) I(X,Y) H(X) H(Y)

Non-linear PCA • Add non-linearity function g(.) in the formula for PCA

One-unit contrast functions • Find one vector, w, so that wTx equals one of the independent components, si • Related to projection pursuit • Prior knowledge of number of independent components not needed

Negentropy • Difference between differential entropy of y and differential entropy of Gaussian variable with same variance • If the yi are uncorrelated, the mutual information can be expressed as • J(y) can be approximated by higher-order cumulants, but estimation is sensitive to outliers

Algorithms • Have x=As, want to find s=Wx • Preprocessing • Centering of x • Sphering (whitening) of x • Find transformation; v=Qx such that E(vvT)=I • Found via PCA / SVD • Sphering does not solve problem alone

Algorithms (2) • Jutten-Herault • Cancel non-linear cross-correlations • Non-diagonal terms of W are updated according to • The yi are updated iteratively as y = (I+W)-1x • Non-linear decorrelation • Non-linear PCA • FastICA, ..., etc.

Summary • Definitions of ICA • Conditions for identifiability of model • Relations to other methods • Contrast functions • One-unit / multi-unit • Mutual information / Negentropy • Applications of ICA • Algorithms

Future research • Noisy ICA • Tailor-made methods for certain applications • Use of time correlations if x is a stochastic process • Time delays/echoes in cocktail-party problem • Non-linear ICA

Survey on ICA

Survey on ICA

Presentation Transcript

Survey on ICA

ICA # 15

ICA

ICA Sanitation Inspection

ICA Modelling Seminar

ICA

ICA and PCA

APLU - ICA

bm0621 ICA 1

The ICA

ICA/SUV: Subcommittee on Science Archives .