380 likes | 531 Views
Survey on ICA . Technical Report, Aapo Hyvärinen, 1999. http://ww.icsi.berkeley.edu/~jagota/NCS. Outline. 2nd-order methods PCA / factor analysis Higher order methods Projection pursuit / Blind deconvolution ICA definitions criteria for identifiability relations to other methods
E N D
Survey on ICA Technical Report, Aapo Hyvärinen, 1999. http://ww.icsi.berkeley.edu/~jagota/NCS
Outline • 2nd-order methods • PCA / factor analysis • Higher order methods • Projection pursuit / Blind deconvolution • ICA • definitions • criteria for identifiability • relations to other methods • Applications • Contrast functions • Algorithms
General model x = As + n Observations Mixing matrix Noise Latent variables, factors, independent components
Find transformation s = f (x) Consider only linear transformation: s = Wx
Principal component analysis • Find direction(s) where variance of wTx is maximized. • Equivalent to finding the eigenvectors of C=E(xxT) corresponding to the k largest eigenvalues
Factor analysis • Closely related to PCA • x = As + n • Method of principal factors: • Assumes knowledge of covariance matrix of the noise: E(nnT) • PCA on: C = E(xxT)– E(nnT) • Factors are not defined uniquely, but only up to a rotation
Higher order methods • Projection pursuit • Redundancy reduction • Blind deconvolution • Requires assumption that data are not Gaussian
Projection pursuit • Find direction w, such that wTx has an ’interesting’ distribution • Argued that interesting directions are those that show the least Gaussian distribution
Differential entropy • Maximised when f is a Gaussian density • Minimize H(wTx) to find projection pursuit directions (y = wTx) • Difficult to estimate the density of wTx
Blind deconvolution Observe filtered version of s(t): x(t) = s(t)*g(t) Find filter h(t), such that s(t) = h(t)*x(t)
Example blind deconvolution Seismic: ”statistical deconvolution”
Blind deconvolution (3) g(t) t s(t) t
ICA definitions Definition 1 (General definition) ICA of a random vector x consists of finding a linear transformation, s=Wx, so that the components, si, are as independent as possible, in the sense of maximizing some function F(s1,..,sm) that measure independence.
ICA definitions Definition 2 (Noisy ICA) ICA of a random vector x consists of estimating the following model for the data: x = As + n where the latent variables si are assumed independent Definition 3 (Noise-free ICA) x = As
Statistical independence • ICA requires statistical independence • Distinguish between statistically independent and uncorrelated variables • Statistically independent: • Uncorrelated:
Identifiability of ICA model • All the independent components, but one, must be non-Gaussian • The number of observed mixtures must be at least as large the number of independent components, m >= n • The matrix A must be of full column rank • Note: with m < n, A may still be indentifiable
Relations to other methods • Redundancy reduction • Noise free case • Find ’interesting’ projections • Special case of projection pursuit • Blind deconvolution • Factor analysis for non-Gaussian data • Related to non-linear PCA
Applications of ICA • Blind source separation • Cocktail party problem • Feature extraction • Blind deconvolution
Objective (contrast) functions ICA method = Objective function + Optimization algorithm • Multi-unit contrast functions • Find all independent components • One-unit contrast functions • Find one independent component (at a time)
Mutual information • Mutual information is zero if the yi are independent • Difficult to estimate, approximations exist
Mutual information (2) • Alternative definition
Mutual information (3) H(X|Y) H(Y|X) I(X,Y) H(X) H(Y)
Non-linear PCA • Add non-linearity function g(.) in the formula for PCA
One-unit contrast functions • Find one vector, w, so that wTx equals one of the independent components, si • Related to projection pursuit • Prior knowledge of number of independent components not needed
Negentropy • Difference between differential entropy of y and differential entropy of Gaussian variable with same variance • If the yi are uncorrelated, the mutual information can be expressed as • J(y) can be approximated by higher-order cumulants, but estimation is sensitive to outliers
Algorithms • Have x=As, want to find s=Wx • Preprocessing • Centering of x • Sphering (whitening) of x • Find transformation; v=Qx such that E(vvT)=I • Found via PCA / SVD • Sphering does not solve problem alone
Algorithms (2) • Jutten-Herault • Cancel non-linear cross-correlations • Non-diagonal terms of W are updated according to • The yi are updated iteratively as y = (I+W)-1x • Non-linear decorrelation • Non-linear PCA • FastICA, ..., etc.
Summary • Definitions of ICA • Conditions for identifiability of model • Relations to other methods • Contrast functions • One-unit / multi-unit • Mutual information / Negentropy • Applications of ICA • Algorithms
Future research • Noisy ICA • Tailor-made methods for certain applications • Use of time correlations if x is a stochastic process • Time delays/echoes in cocktail-party problem • Non-linear ICA