1 / 31

An NLA Look at σ min Universality (& the Stoch Diff Operator)

Alan Edelman and Po-Ru Loh MIT Applied Mathematics Random Matrices October 10, 2010. An NLA Look at σ min Universality (& the Stoch Diff Operator). Outline. History of σ min universality Proof idea of Tao-Vu (in NLA language) What the proof doesn't prove

hu-grimes
Download Presentation

An NLA Look at σ min Universality (& the Stoch Diff Operator)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alan Edelman and Po-Ru Loh MIT Applied Mathematics Random Matrices October 10, 2010 An NLA Look at σmin Universality(& the Stoch Diff Operator)

  2. Outline • History of σmin universality • Proof idea of Tao-Vu (in NLA language) • What the proof doesn't prove • Do stochastic differential operators say more?

  3. A bit of history • E ('89): Explicit formula (for finite iid Gaussian n x n matrices) for distribution of σn • Analytic techniques: Integration of a joint density function, Tricomi functions, Kummer's differential equation, etc. • Striking convergence to limiting distribution even in non-Gaussian case • Parallel Matlab in the MIT computer room • “Please don’t turn off” (no way to save work in the background on those old Sun workstations)

  4. Empirical smallest singular value distribution: n x n Gaussian entries

  5. Empirical smallest singular value distribution: n x n ±1 entries 5

  6. Extending to non-Gaussians: How? • Central limit theorem is a mathematical statement and a “way of life” • Formally: a (series of) theorems – with assumptions (e.g., iid) – and if the assumptions are not met, the theorems don't apply • Way of life: “When a bunch of random variables are mixed up enough, they behave as if Gaussian” • Example from our discussions: Does the square root of a sum of squares of (almost) iid random variables go to χn? Probably an application of CLT but not precisely CLT without some tinkering (what does “go to” mean when n is changing?)

  7. Outline • History of σmin universality • Proof idea of Tao-Vu (in NLA language) • What the proof doesn't prove • Do stochastic differential operators say more?

  8. Tao-Vu ('09) “the rigorous proof”! • Basic idea (NLA reformulation)...Consider a 2x2 block QR decomposition of M: 1. The smallest singular value of R22, scaled by √n/s, is a good estimate for σn! 2. R22 (viewed as the product Q2T M2) is roughly s x s Gaussian n-s s n-s s M = (M1 M2) = QR = (Q1 Q2)( ) Note: Q2T M2 = R22 R11 R12 n-s R22 s

  9. Basic idea part 1: σs√n/s ≈ σn • The smallest singular value of M is the reciprocal of the largest singular value of M-1 • Singular values of R22 are exactly the inverse singular values of an s-row subsample of M-1 • The largest singular value of an approximately low-rank matrix reliably shows up in a random sample (Vempala et al.; Rokhlin, Tygert et al.) • Referred to as “property testing” in theoretical CS terminology

  10. Basic idea part 2: R22 ≈ Gaussian • Recall R22 = Q2T M2 • Note that Q1 is determined by M1 and thus independent of M2 • Q2 can be any orthogonal completion of Q1 • Thus, multiplying by Q2T “randomly stirs up” entries of the (independent) n x s matrix M2 • Any “rough edges” of M2 should be smoothed away in the s x s result R22

  11. Basic idea (recap) • 1. The smallest singular value of R22, scaled by √n/s, is a good estimate for σn! • 2. R22 (viewed as the product Q2T M2) ≈ s x s Gaussian • We feel comfortable (from our CLT “way of life”) that part 2 works well • How well does part 1 work?

  12. Outline • History of σmin universality • Proof idea of Tao-Vu (in NLA language) • What the proof doesn't prove • Do stochastic differential operators say more?

  13. Fluctuates around ±10% of truth (at least for this matrix) How good is the s x s estimator?

  14. A few more tries... How good is the s x s estimator?

  15. A lot more tries... again, accurate to say 10% More s x s estimator experimentsGaussian entries, ±1 entries 15 15 15 15

  16. A lot more tries, now comparing matrix sizes s = 10 to 50% (of n)... n = 200 a bit better More s x s estimator experimentsn = 100 vs. n = 200

  17. How good is the s x s estimator? • On one hand, surprisingly good, especially when not expecting any such result • “Did you know you can get the smallest singular value to within 10% just by looking at a corner of the QR?” • On the other hand, still clearly an approximation: n would need to be huge in order to reach human-indistinguishable agreement

  18. Bounds from the proof • “C is a sufficiently large const (104 suffices)” • Implied constants in O(...) depend on E|ξ|C • For ξ = Gaussian, this is 9999!! • s = n500/C • To get s = 10, n ≈ 1020? • Various tail bounds go as n-1/C • To get 1% chance of failure, n ≈ 1020000??

  19. Reality check: n x n Gaussian entries, revisited

  20. Reality check: n x n ±1 entries, revisited

  21. … but the truth is far stronger than what the approximation can tell us What the proof doesn't prove • The s x s estimator is pretty nifty...

  22. Outline • History of σmin universality • Proof idea of Tao-Vu (in NLA language) • What the proof doesn't prove • Do stochastic differential operators say more?

  23. Can another approach get us closer to the truth? • Recall the standard numerical SVD algorithm starting with Householder bidiagonalization • In the case of Gaussian random matrices, each Householder step puts a χ distribution on the bidiagonal and leaves the remaining subrectangle Gaussian • At each stage, all χ's and Gaussians in the entries are independent of each other (due to isotropy of multivariate Gaussians)

  24. Bidiagonalization process for n x n Gaussian matrix

  25. A stochastic operator connection • E ('03) argued that the bidiagonal of χ's can be viewed as a discretization of a stochastic Bessel operator • – √x d/dx + “noise” / √2 • As n grows, the discretization becomes smoother, and the (scaled) singular value distributions of the matrices ought to converge to those of the operator

  26. A stochastic operator connection

  27. k=1 How close are we if we use kxk chi’s in the bottom, rest Gaussian? n=200; t=1000000; v=zeros(t,1); for k=1 x=sqrt(n:-1:1); y=sqrt(n-1:-1:1); v=zeros(t,1); k, endx=(n-k+1:n); endy=(n-k+1:n-1); dofx=k:-1:1; dofy=(k-1):-1:1; for i=1:t yy=y+randn(1,n-1)/sqrt(2); xx=x+randn(1,n)/sqrt(2); xx(endx)=sqrt(chi2rnd(dofx)); yy(endy)=sqrt(chi2rnd(dofy)); v(i)=min(bidsvd(xx,yy)); if rem(i,500)==0,[i k],end end hold off v=v*sqrt(n); n=100 n=200 k=2 k=3 k=0 k=4 Area of Detail k=5 k=6 k=7..10 k=inf 1 Million Trials in each experiment (Probably as n→inf, there is still a little upswing for finite k?)

  28. A stochastic operator connection • Ramírez and Rider ('09) produced a proof • In further work with Virág, they have applied the SDO machinery to obtain similar convergence results for largest eigenvalues of beta distributions, etc.

  29. Extending to non-Gaussians: How? • The bidiagonalization mechanism shouldn't care too much about the difference... • Each Householder spin “stirs up” the entries of the remaining subrectangle, making them “more Gaussian” (according to Berry-Esseen, qTx is close to Gaussian as long as entries of q are evenly distributed) • Almost-Gaussians combine into (almost-independent) almost-χ's • Original n2 entries compress to 2n-1

  30. SDO mechanism • Old intuition: non-Gaussian n x n matrices act like Gaussian n x n matrices (which we understand) • New view: non-Gaussian and Gaussian n x n matrices are both discretizations of the same object • Non-random discretizations have graininess in step size, where to take finite differences, etc. • SDO discretizations have issues like almost-independence... but can be overcome?

  31. Some grand questions • Could an SDO approach circumvent property testing (sampling the bottom-right s x s) and thereby get closer to the truth? • Does the mathematics of today have enough technology for this? (If not, can someone invent the new technology we need?)

More Related