340 likes | 351 Views
This survey provides an overview of Independent Component Analysis (ICA), including definitions, identifiability conditions, relations to other methods, and applications. It also discusses contrast functions, algorithms, and future research directions.
E N D
Survey on ICA Technical Report, Aapo Hyvärinen, 1999. http://ww.icsi.berkeley.edu/~jagota/NCS
Outline • 2nd-order methods • PCA / factor analysis • Higher order methods • Projection pursuit / Blind deconvolution • ICA • definitions • criteria for identifiability • relations to other methods • Applications • Contrast functions • Algorithms
General model x = As + n Observations Mixing matrix Noise Latent variables, factors, independent components
Find transformation s = f (x) Consider only linear transformation: s = Wx
Principal component analysis • Find direction(s) where variance of wTx is maximized. • Equivalent to finding the eigenvectors of C=E(xxT) corresponding to the k largest eigenvalues
Factor analysis • Closely related to PCA • x = As + n • Method of principal factors: • Assumes knowledge of covariance matrix of the noise: E(nnT) • PCA on: C = E(xxT)– E(nnT) • Factors are not defined uniquely, but only up to a rotation
Higher order methods • Projection pursuit • Redundancy reduction • Blind deconvolution • Requires assumption that data are not Gaussian
Projection pursuit • Find direction w, such that wTx has an ’interesting’ distribution • Argued that interesting directions are those that show the least Gaussian distribution
Differential entropy • Maximised when f is a Gaussian density • Minimize H(wTx) to find projection pursuit directions (y = wTx) • Difficult to estimate the density of wTx
Blind deconvolution Observe filtered version of s(t): x(t) = s(t)*g(t) Find filter h(t), such that s(t) = h(t)*x(t)
Example blind deconvolution Seismic: ”statistical deconvolution”
Blind deconvolution (3) g(t) t s(t) t
ICA definitions Definition 1 (General definition) ICA of a random vector x consists of finding a linear transformation, s=Wx, so that the components, si, are as independent as possible, in the sense of maximizing some function F(s1,..,sm) that measure independence.
ICA definitions Definition 2 (Noisy ICA) ICA of a random vector x consists of estimating the following model for the data: x = As + n where the latent variables si are assumed independent Definition 3 (Noise-free ICA) x = As
Statistical independence • ICA requires statistical independence • Distinguish between statistically independent and uncorrelated variables • Statistically independent: • Uncorrelated:
Identifiability of ICA model • All the independent components, but one, must be non-Gaussian • The number of observed mixtures must be at least as large the number of independent components, m >= n • The matrix A must be of full column rank • Note: with m < n, A may still be indentifiable
Relations to other methods • Redundancy reduction • Noise free case • Find ’interesting’ projections • Special case of projection pursuit • Blind deconvolution • Factor analysis for non-Gaussian data • Related to non-linear PCA
Applications of ICA • Blind source separation • Cocktail party problem • Feature extraction • Blind deconvolution
Objective (contrast) functions ICA method = Objective function + Optimization algorithm • Multi-unit contrast functions • Find all independent components • One-unit contrast functions • Find one independent component (at a time)
Mutual information • Mutual information is zero if the yi are independent • Difficult to estimate, approximations exist
Mutual information (2) • Alternative definition
Mutual information (3) H(X|Y) H(Y|X) I(X,Y) H(X) H(Y)
Non-linear PCA • Add non-linearity function g(.) in the formula for PCA
One-unit contrast functions • Find one vector, w, so that wTx equals one of the independent components, si • Related to projection pursuit • Prior knowledge of number of independent components not needed
Negentropy • Difference between differential entropy of y and differential entropy of Gaussian variable with same variance • If the yi are uncorrelated, the mutual information can be expressed as • J(y) can be approximated by higher-order cumulants, but estimation is sensitive to outliers
Algorithms • Have x=As, want to find s=Wx • Preprocessing • Centering of x • Sphering (whitening) of x • Find transformation; v=Qx such that E(vvT)=I • Found via PCA / SVD • Sphering does not solve problem alone
Algorithms (2) • Jutten-Herault • Cancel non-linear cross-correlations • Non-diagonal terms of W are updated according to • The yi are updated iteratively as y = (I+W)-1x • Non-linear decorrelation • Non-linear PCA • FastICA, ..., etc.
Summary • Definitions of ICA • Conditions for identifiability of model • Relations to other methods • Contrast functions • One-unit / multi-unit • Mutual information / Negentropy • Applications of ICA • Algorithms
Future research • Noisy ICA • Tailor-made methods for certain applications • Use of time correlations if x is a stochastic process • Time delays/echoes in cocktail-party problem • Non-linear ICA