370 likes | 551 Views
Gaussianization based on Principal Components Analisys (GPCA): an easy tool for optimal signal processing. Valero Laparra Jesús Malo Gustavo Camps. INDEX. What? Why? How? Conclusions Toolbox. What?. Estimate multidimensional Probability Densities
E N D
Gaussianization based on Principal Components Analisys (GPCA):an easy tool for optimal signal processing. Valero Laparra Jesús Malo Gustavo Camps
INDEX What? Why? How? Conclusions Toolbox
What? • Estimate multidimensional Probability Densities • How the N-D data is distributed in the N-D space What to pay atention to! What is important from our data
Why? • GENERIC OPTIMAL SOLUTIONS
Why? • GENERIC OPTIMAL SOLUTIONS
Why? • GENERIC OPTIMAL SOLUTIONS
Why? • GENERIC OPTIMAL SOLUTIONS
How? • PDF estimation through samples always asume a model. • HISTOGRAM: without assuming a functional model
How? • X = [ -1.66 1.25 0.73 1.72 0.88 0.19 -0.81 0.42 -0.14 …]
How? Nbins = √Nsamples • Problem: Number of bins estimation
How? • Problem: “the curse of dimensionality” - Nb_total = Nb_dim ^N_dim - If we assume: Ns = Nb^2 - Ns = Nb^2*Nd
How? • Problem: “the curse of dimensionality” Nb total = Nb dimension ^N dimensions e.g. • Assuming a minimum number of Nb = 11 bins • We need Ns = 11^2*Nd • If Nd = 1, Ns = 121 968 Bytes Nd = 2, Ns = 14641 117.128 Bytes Nd = 3, Ns = 1771561 14.172.488 Bytes Nd = 4, Ns = 214358881 1.714.871.048 Bytes Nd = 5, Ns = 25.937.000.000 HELP MEMORY Nd = 6, Ns = 3.138.400.000.000HELP MEMORY
How? • From P(x) to P(y) (Gaussian) ? T
How? Answer: GPCA MATLAB, MATLAB, WHAT A WONDERFUL WORLD
How? Theoretical convergence Proof • Negentropy:
How? OPEN ISSUE
How? GAUSSIAN UNIQUE DISTRIBUTION WITH MARGINAL DISTRIBUTIONS GAUSSIANS AND INDEPENDENTS • Stop criterion: I (Xn) = ~ 0 NOTE THAT: Measuring Mutual Information
How? GPCA Inverse NOTE THAT: Synthesis
How? GPCA Jacobian
CONCLUSIONS • The optimal solution of many problems involves the knoledge of the data pdf. • GPCA obtains a transform that convert any pdf in a Gaussian pdf. • It has an easy inverse. • It has an easy Jacobian. • This transform can be used to calculate the pdf of any data.
GPCA toolbox (Matlab)3 examples Wiki-page Beta version • PDF estimation • Mutual Information Measures • Synthesis
Basic toolbox • [datT Trans] = GPCA (dat, Nit, Perc) - dat = data matrix with [N dimensions x N samples] e.g. 100 samples from 2-D gaussian dat = [2 x 100] • Nit = Number of iterations • Perc = percentage of increase the pdf Range.
Basic toolbox • Perc = percentage of increase the pdf range.
Basic toolbox • [datT Trans] = auto_GPCA(dat) • [datT] = apply_GPCA(dat,Trans) • [dat] = inv_GPCA(datT,Trans) • [Px pT detJ JJ] = GPCA_probability(x0,Trans)
Estimating PDF/manifold • [datT Trans] = auto_GPCA(dat) • [Px pT detJ JJ] = GPCA_probability (XX,Trans);
Estimating PDF/manifold • [datT Trans] = auto_GPCA(dat) • [Px pT detJ JJ] = GPCA_probability (XX,Trans);
Estimating PDF/manifold • [datT Trans] = auto_GPCA(dat) • [Px pT detJ JJ] = GPCA_probability (XX,Trans);
Estimating PDF/manifold • PROBLEMS • Not always arrives to Gaussian • Pdf with clusters is more complicated • The Jacobian estimation is highly point-dependent • The derivative (in the Jacobian estimation) is much more irregular than the integral. • The pdf has to be estimated for each point
Measuring Mutual Information • [datT Trans] = auto_GPCA(dat) • MI = abs(min(cumsum(cat(1,Trans.I)))));
Measuring Mutual Information • PROBLEMS • Entropy estimators are not perfectly defined • More iterations, more error • As more complicated pdf, more error
Synthesizing data • [datT Trans] = auto_GPCA(dat) • [dat2] = inv_GPCA( randn(Dim,Nsamples) , Trans); T1 Inv T1 T2 Inv T2
Synthesizing data • [datT Trans] = auto_GPCA(dat) • [dat2] = inv_GPCA(randn(Dim,Nsamples),Trans);
Synthesizing data • PROBLEMS • Not always arrive to a Gaussian • Little variations on the variance of the random data obtains very different results. • No information about features of the data in the transformed domain.