Valero Laparra Jesús Malo Gustavo Camps

Gaussianization based on Principal Components Analisys (GPCA):an easy tool for optimal signal processing. Valero Laparra Jesús Malo Gustavo Camps

INDEX What? Why? How? Conclusions Toolbox

What? • Estimate multidimensional Probability Densities • How the N-D data is distributed in the N-D space What to pay atention to! What is important from our data

What?

Why? • GENERIC OPTIMAL SOLUTIONS

How? • PDF estimation through samples always asume a model. • HISTOGRAM: without assuming a functional model

How? • X = [ -1.66 1.25 0.73 1.72 0.88 0.19 -0.81 0.42 -0.14 …]

How? Nbins = √Nsamples • Problem: Number of bins estimation

How? • Problem: “the curse of dimensionality” - Nb_total = Nb_dim ^N_dim - If we assume: Ns = Nb^2 - Ns = Nb^2*Nd

How? • Problem: “the curse of dimensionality” Nb total = Nb dimension ^N dimensions e.g. • Assuming a minimum number of Nb = 11 bins • We need Ns = 11^2*Nd • If Nd = 1, Ns = 121 968 Bytes Nd = 2, Ns = 14641 117.128 Bytes Nd = 3, Ns = 1771561 14.172.488 Bytes Nd = 4, Ns = 214358881 1.714.871.048 Bytes Nd = 5, Ns = 25.937.000.000 HELP MEMORY Nd = 6, Ns = 3.138.400.000.000HELP MEMORY

How? • From P(x) to P(y) (Gaussian) ? T

How?

How? Answer: GPCA MATLAB, MATLAB, WHAT A WONDERFUL WORLD

How? Theoretical convergence Proof • Negentropy:

How? OPEN ISSUE

How? GAUSSIAN UNIQUE DISTRIBUTION WITH MARGINAL DISTRIBUTIONS GAUSSIANS AND INDEPENDENTS • Stop criterion: I (Xn) = ~ 0 NOTE THAT: Measuring Mutual Information

How? GPCA Inverse NOTE THAT: Synthesis

How? GPCA Jacobian

CONCLUSIONS • The optimal solution of many problems involves the knoledge of the data pdf. • GPCA obtains a transform that convert any pdf in a Gaussian pdf. • It has an easy inverse. • It has an easy Jacobian. • This transform can be used to calculate the pdf of any data.

GPCA toolbox (Matlab)3 examples Wiki-page Beta version • PDF estimation • Mutual Information Measures • Synthesis

Basic toolbox • [datT Trans] = GPCA (dat, Nit, Perc) - dat = data matrix with [N dimensions x N samples] e.g. 100 samples from 2-D gaussian dat = [2 x 100] • Nit = Number of iterations • Perc = percentage of increase the pdf Range.

Basic toolbox • Perc = percentage of increase the pdf range.

Basic toolbox • [datT Trans] = auto_GPCA(dat) • [datT] = apply_GPCA(dat,Trans) • [dat] = inv_GPCA(datT,Trans) • [Px pT detJ JJ] = GPCA_probability(x0,Trans)

Estimating PDF/manifold • [datT Trans] = auto_GPCA(dat) • [Px pT detJ JJ] = GPCA_probability (XX,Trans);

Estimating PDF/manifold • PROBLEMS • Not always arrives to Gaussian • Pdf with clusters is more complicated • The Jacobian estimation is highly point-dependent • The derivative (in the Jacobian estimation) is much more irregular than the integral. • The pdf has to be estimated for each point

Measuring Mutual Information • [datT Trans] = auto_GPCA(dat) • MI = abs(min(cumsum(cat(1,Trans.I)))));

Measuring Mutual Information

Measuring Mutual Information • PROBLEMS • Entropy estimators are not perfectly defined • More iterations, more error • As more complicated pdf, more error

Synthesizing data • [datT Trans] = auto_GPCA(dat) • [dat2] = inv_GPCA( randn(Dim,Nsamples) , Trans); T1 Inv T1 T2 Inv T2

Synthesizing data • [datT Trans] = auto_GPCA(dat) • [dat2] = inv_GPCA(randn(Dim,Nsamples),Trans);

Synthesizing data • PROBLEMS • Not always arrive to a Gaussian • Little variations on the variance of the random data obtains very different results. • No information about features of the data in the transformed domain.

Thanks for your time

Valero Laparra Jesús Malo Gustavo Camps

Valero Laparra Jesús Malo Gustavo Camps

Presentation Transcript

pbis.org

Farewell to Manzanar Introduction

Draft Status of the U.S. petrale sole resource in 2012 STAR Panel

La Investigación generadora de riqueza

“Going Green In The Laboratory” Laboratory Association of NH September 24, 2009

District 2 Youth Camps 2009 Trollhaugen – Stampede Pass, WA June 28 - July 11

Jacques Cartier

Gold crackdown