Independent Component Analysis

Independent Component Analysis 主講人：虞台文

Content • What is ICA? • Nongaussianity Measurement — Kurtosis • ICA By Maximization of Nongaussianity • Gradient and FastICA Algorithms Using Kurtosis • Measuring Nongaussianity by Negentropy • FastICA Using Negentrophy

Independent Component Analysis What is ICA?

Motivation • Example: three people are speaking simultaneously in a room that has three microphones. • Denote the microphone signals by x1(t), x2(t), and x3(t). • They are mixtures of sources s1(t), s2(t), and s3(t). • The goal is to estimate the original speech signals using only the recorded signals. • This is called the cocktail-party problem.

The Cocktail-Party Problem The original speech signals The mixed speech signals

The Cocktail-Party Problem The original speech signals The estimated sources

The Problem • Find the sources s1(t), s2(t) and s3(t), and the coefficients aij’s from the observed signals x1(t), x2(t), and x3(t). • It turns out that the problem can be solved just by assuming that the sources si(t) are nongaussian and statistically independent.

Applications • Cocktail party problem: separation of voices or music or sounds • Sensor array processing, e.g. radar • Biomedical signal processing with multiple sensors: EEG, ECG, MEG, fMRI • Telecommunications: e.g. multiuser detection in CDMA • Financial and other time series • Noise removal from signals and images • Feature extraction for images and signals • Brain modelling

Latent variables Basic ICA Model Mixing signals (observable)

The Basic Assumptions • The independent components are assumed statistically independent. • The independent components must have nongaussian distributions. • For simplicity, we assume that the unknown mixing matrix A is square.

Assumption I:Statistical Independence • Basically, random variables y1, y2, …, yn are said to be independent if information on the value of yi does not give any information on the value of yj for ij. • Mathematically, the joint pdf is factorizable in the following way: p(y1, y2, …, yn) = p1(y1) p2(y2)…pn(yn) • Note that uncorrelatedness does not necessary imply independence.

Assumption II:Nongaussian Distributions • Note that in the basic model we do not have to knowwhat the nongaussian distributions of the ICs look like.

Assumption III:Mixing Matrix is square • In other words, the number of independent components is equal to the number of observed mixtures. • This simplifies our discussion in the first stage. • However, in the basic ICA model, this is no restriction as long as originally the number of observations xi is at least as large as the number of sources sj.

Ambiguities of ICA • We cannot determine the variances (energies) of IC’s. • This also implies E[x]=0 (centering of x) and sign of si is unimportant. • We cannot determine the order of IC’s. Therefore, we assume where P is any permutation matrix.

Illustration of ICA Mixing

Whitening Is Only Half of ICA Whitening Matrix whitening

Uncorrelatedness is related to independence, but is weaker than independence. Whitening Is Only Half of ICA By whitening, we have E[zzT] = I. This, however, doesn’t imply zi’s are independent, i.e., we may have

Independent Component Analysis Central limit theorem implicitly tells us that the additive of components, makes the distribution to become ‘more’ Gaussian. Therefore, nongaussianity is an important criterion for ICA. Degaussian is hence the central theme in ICA.

Independent Component Analysis Nongaussianity Measurement — Kurtosis

Moments The jth moment: Mean: The jth central moment: Variance: Skewness:

Moment Generating Function • The moment generating function MX(t) of a random variable X is defined by: • X~N(, 2) • Z~N(0, 1)

Standard Normal Distribution N(0, 1) Zero for all odd moments

Kurtosis • Kurtosis of a zero-mean random variable X is defined by • Normalize kurtosis:

Supergaussian Subgaussian Gaussian Gaussianity

Kurtosis for Supergaussian Consider Laplacian Distribution: > 0

Kurtosis for Subgassian Consider Uniform Distribution: < 0

Nongaussianity Measurement By Kurtosis • Kurtosis, or rather is absolute value, has been widely used as a measure of nongaussianity in ICA and related fields. • Computationally, kurtosis can be estimated simply by using the 4th moment of the sample data (if the variance is kept constant).

Properties of Kurtosis • Let X1 and X2 be two independent variables both have zero mean.

Independent Component Analysis ICA By Maximization of Nongaussianity

zero mean & unit variance IC’s (latent; unknown) mixing matrix (unknown) zero mean (observable) Restate the Problem How? Ultimate goal

Simplification Ultimate goal For simplicity, we assume sources are i.i.d. To estimate an independent component by whitening If b is properly identified, qT= bTA contains only one nonzero entry with value one. This implies that b will be one row of identified, A1.

Nongaussian Is Independent Ultimate goal For simplicity, we assume sources are i.i.d. To estimate an independent component by whitening We will take b that maximizes the nongaussianity of bTx.

s2 s1 Nongaussian Is Independent Mixing

Nongaussian Is Independent whitening

Nongaussian Is Independent Additive of components becomes more Gaussian

y2 y1 Nongaussian Is Independent Rotation

y2 p(yi) y1 yi Nongaussian Is Independent Estimated density

y2 y1 Nongaussian Is Independent Consider to get one independent component. bT x

w Nongaussian Is Independent Consider to get one independent component. Project the whitened data to a unit vector w to get an independent component.

Nongaussian Is Independent q2 q1 Using kurtosis as nongaussianity measurement. We require that The search space is

Independent Component Analysis Gradient Algorithm Using Kurtosis

Criterion for ICA Using Kurtosis maximize Subject to

Gradient Algorithm maximize Subject to unrelated

FastICA Algorithm maximize Subject to At a stable point, the gradient must point in the direction of w. Using fixed-point interation, then sign is not important FastICA

Independent Component Analysis Measuring Nongaussianity by Negentropy

Critique of Kurtosis • Kurtosis can be very sensitive to outliers. • Kurtosis may depend on only a few observations in the tails of the distribution. • Not a robust measure of nongaussianity.

Negentropy Differential Entropy Negentropy Entropy ≧0 Negentropy is zero only when the random variable is Gaussian distributed. It is invariant by a invertible linear transformation.

Skewness Kurtosis Approximation of Negentropy (I) For a zero mean and unit variance random variable. Using approximation is helpless because it is sensitive to outliers.

Approximation of Negentropy (II) Choose two nonpolynomialfunctions G1(x) odd Measures the dimension of bimodality vs. peak at zero such that G2(x)  even Measures the asymmetry The first term is zero if the underlying density is zero. Usually, only the second term is used.

Independent Component Analysis