400 likes | 690 Views
An Introduction to Independent Component Analysis (ICA). 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室. The Principle of ICA: a cocktail-party problem. x 1 (t) =a 11 s 1 (t) +a 12 s 2 (t) +a 13 s 3 (t) x 2 (t) =a 21 s 1 (t) +a 22 s 2 (t) +a 12 s 3 (t)
E N D
An Introduction to Independent Component Analysis (ICA) 吳育德 陽明大學放射醫學科學研究所 台北榮總整合性腦功能實驗室
The Principle of ICA: a cocktail-party problem x1(t)=a11 s1(t)+a12 s2(t)+a13 s3(t) x2(t)=a21 s1(t)+a22 s2(t) +a12 s3(t) x3(t)=a31 s1(t)+a32 s2(t) +a33 s3(t)
Independent Component Analysis Reference : A. Hyvärinen, J. Karhunen, E. Oja (2001) John Wiley & Sons. Independent Component Analysis
Central limit theorem • The distribution of a sum of independent random variables tends toward a Gaussian distribution ICn Observed signal = m1 IC1 + m2 IC2 ….+ mn toward Gaussian Non-Gaussian Non-Gaussian Non-Gaussian
Partial sum of a sequence {zi} of independent and identically distributed random variables zi Partial sum of a sequence {zi} of independent and identically distributed random variables zi Since mean and variance of xk can grow without bound as k, consider instead of xk the standardized variables The distribution of yk a Gaussian distribution with zero mean and unit variance when k. Central Limit Theorem
How to estimate ICA model • Principle for estimating the model of ICA Maximization of NonGaussianity
Measures for NonGaussianity • Kurtosis Kurtosis : E{(x- )4}-3*[E{(x-)2}] 2 Super-Gaussian kurtosis > 0 Gaussian kurtosis = 0 Sub-Gaussian kurtosis < 0 kurt(x1+x2)= kurt(x1) + kurt(x2) kurt(x1) =4kurt(x1)
1 - = = T z Vx D E x 2 Whitening process = • Assume measurement is zero mean and x A s = T E { ss } I • Let Dand E be theeigenvalues and eigenvector matrix of • covariance matrix of x, i.e. = T T E { xx } EDE 1 - • Then is a whitening matrix = T V D E 2 = T T T E { zz } V E { xx } V 1 1 - - = T T D E EDE ED 2 2 = I
Importance of whitening For the whitened data z, find a vector w such that the linear combination y=wTz has maximum nongaussianityunder the constrain Then Maximize | kurt(wTz)| under the simpler constraint that ||w||=1
Constrained Optimization 2 2 l = + l - = = T L ( w , ) F ( w ) ( w 1 ), w w w 1 max F(w), ||w||2=1 ¶ l L ( w , ) = 0 ¶ w ¶ F ( w ) Þ + l = [ 2 w ] 0 ¶ w ¶ F ( w ) Þ = - l 2 w ¶ w At the stable point, the gradient of F(w) must pointin the direction of w, i.e. equal tow multiplied by a scalar.
Gradient of kurtosis = = - T T 4 T 2 2 F ( w ) kurt ( w z ) E {( w z ) } 3 [ E {( w z ) ] } ¶ - T 4 T 2 E {( w z ) } 3 E {( w z ) }} ¶ F ( w ) = ¶ ¶ w w 1 å T ¶ - T 4 T 2 ( w z ( t )) 3 ( w w ) 1 å = T T t 1 = E { y } y ( t ) Q = = T t 1 ¶ w 4 å T = - + T 3 T z ( t )[ w z ( t )] 3 * 2 ( w w )( w w ) = T t 1 2 = - T T 3 4 sign ( kurt ( w z ))[ E { z [ w z ] } 3 w w ]
Fixed-point algorithm using kurtosis wk ¶ F ( ) wk+1 = wk + ¶ wk wk ¶ F ( ) -1 = ( + ) ¶ wk l 2 2 ¬ - 3 T Therefore, w E { z [ w z ] } 3 w w ¬ w w / w Note that adding the gradient to wkdoes not change its direction, since wk+1 = wk - ( wk ) l 2 ) wk = (1- 2 l Convergence :|<wk+1 , wk>|=1 since wk and wk+1areunit vectors
Fixed-point algorithm using kurtosis • Centering • Whitening • Choose m, No. of ICs to estimate. Set counter p 1 • Choose an initial guess of unit norm for wp, eg. randomly. • Let • Do deflation decorrelation • Let wp wp/||wp|| • If wp has not converged (|<wpk+1 , wpk>| 1), go to step 5. • Set p p+1. If p m, go back to step 4. One-by-one Estimation Fixed-point iteration
Fixed-point algorithm using negentropy The kurtosis is very sensitive to outliers, which may be erroneous or irrelevant observations ex. r.v. with sample size=1000, mean=0, variance=1, contains one value = 10 kurtosis at least equal to 104/1000-3=7 kurtosis : E{x4}-3 Need to find a more robust measure for nongaussianity Approximation of negentropy
Entropy Fixed-point algorithm using negentropy Entropy Negentropy Approximation of negentropy
Fixed-point algorithm using negentropy Max J(y) w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Convergence : |<wk+1 , wk>|=1
Fixed-point algorithm using negentropy • Centering • Whitening • Choose m, No. of ICs to estimate. Set counter p 1 • Choose an initial guess of unit norm for wp, eg. randomly. • Let • Do deflation decorrelation • Let wp wp/||wp|| • If wp has not converged, go back to step 5. • Set p p+1. If p m, go back to step 4. One-by-one Estimation Fixed-point iteration
Implantations • Create two uniform sources
Implantations • Create two uniform sources
Implantations • Two mixed observed signals
Implantations • Two mixed observed signals
Implantations • Centering
Implantations • Centering
Implantations • Whitening
Implantations • Whitening
Implantations • Fixed-point iteration using kurtosis
Implantations • Fixed-point iteration using kurtosis
Implantations • Fixed-point iteration using kurtosis
Implantations • Fixed-point iteration using negentropy
Implantations • Fixed-point iteration using negentropy
Implantations • Fixed-point iteration using negentropy
Implantations • Fixed-point iteration using negentropy
Implantations • Fixed-point iteration using negentropy
Entropy Entropy A Gaussian variable has the largest entropy among all random variables of equal variance A Gaussian variable has the largest entropy among all random variables of equal variance Negentropy Gaussian 0 nonGaussian >0 Gaussian 0 nonGaussian >0 It’s the optimal estimator but computationally difficult since it requires an estimate of the pdf Negentropy Approximation of negentropy Fixed-point algorithm using negentropy
High-order cumulant approximation Replace the polynomial function by any nonpolynomial fun Gi ex.G1 is odd and G2 is even It’s quite common that most r.v. have approximately symmetric dist. Fixed-point algorithm using negentropy It’s quite common that most r.v. have approximately symmetric dist.
Fixed-point algorithm using negentropy According to Lagrange multiplier the gradient must point in the direction of w
Finding by approximative Newton method Real Newton method is fast. small steps for convergence It requires a matrix inversion at every step. large computational load Special properties of the ICA problem approximative Newton method No need a matrix inversion but converge roughly with the same steps as real Newton method Fixed-point algorithm using negentropy The iteration doesn't have the good convergence properties because the nonpolynomial moments don't have the same nice algebraic properties.
According to Lagrange multiplier the gradient must point in the direction of w Solve this equation by Newton method JF(w)*w = -F(w) w = [JF(w)]-1[-F(w)] E{zzT}E{g'(wTz)} = E{g'(wTz)} I diagonalize Multiply by -E{g '(wTz)} w w [E{zg(wTz)} w] /[E{g'(wTz)} ] w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Fixed-point algorithm using negentropy
Fixed-point algorithm using negentropy w E{zg(wTz)} E{g '(wTz)} w w w/||w|| Convergence : |<wk+1 , wk>|=1 Because ICs can be defined only up to a multiplication sign