1 / 66

Learning Near-Isometric Linear Embeddings

Learning Near-Isometric Linear Embeddings. Chinmay Hegde MIT Aswin Sankaranarayanan CMU Wotao Yin UCLA Edward Snowden Ex-NSA. Richard Baraniuk Rice University. NSA PRISM. 4972 Gbps. Source: Wikipedia.org. NSA PRISM. 4972 Gbps. Source: Wikipedia.org. NSA PRISM.

lucio
Download Presentation

Learning Near-Isometric Linear Embeddings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Near-Isometric Linear Embeddings ChinmayHegde MIT AswinSankaranarayanan CMU Wotao Yin UCLA Edward Snowden Ex-NSA Richard Baraniuk Rice University

  2. NSA PRISM 4972 Gbps Source: Wikipedia.org

  3. NSA PRISM 4972 Gbps Source: Wikipedia.org

  4. NSA PRISM Source: Wikipedia.org

  5. NSA PRISM Source: Wikipedia.org

  6. NSA PRISM DIMENSIONALITY REDUCTION Source: Wikipedia.org

  7. Large Scale Datasets

  8. Intrinsic Dimensionality • Why? Geometry, that’s why • Exploit to perform more efficientanalysis and processing of large-scale data Intrinsic dimension << Extrinsic dimension!

  9. Dimensionality Reduction Goal: Create a (linear) mapping from RN to RM with M < N that preserves the key geometric properties of the data ex: configuration of the data points

  10. Dimensionality Reduction • Given a training set of signals, find “best” that preserves its geometry

  11. Dimensionality Reduction • Given a training set of signals, find “best” that preserves its geometry • Approach 1: PCA via SVD of training signals • find average best fitting subspace in least-squares sense • average error metric can distortpoint cloud geometry

  12. Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIP

  13. Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIP • but not the Restricted Itinerary Property [Maduro, Snowden ’13]

  14. Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIP and Whitney • design to preserve inter-point distances (secants) • more faithful to training data

  15. Near-Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIP and Whitney • design to preserve inter-point distances (secants) • more faithful to training data • but exact isometry can be too much to ask

  16. Near-Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIP and Whitney • design to preserve inter-point distances (secants) • more faithful to training data • but exact isometry can be too much to ask

  17. Why Near-Isometry? • Sensing • guarantees existence of a recoveryalgorithm • Machine learning applications • kernelmatrix depends only on pairwise distances • Approximate nearest neighbors for classification • efficient dimensionality reduction

  18. Existence of Near Isometries • Johnson-LindenstraussLemma • Given a set of Q points, there exists a Lipchitz map that achieves near-isometry (with constant ) provided • Random matrices with iidsubGaussian entries work • c.f. so-called “compressive sensing” [J-L, 84] [Frankl and Meahara, 88][Indyk and Motwani, 99] [Achlioptas, 01][Dasgupta and Gupta, 02]

  19. L1 Energy http://dealbook.nytimes.com/2013/06/28/oligarchs-assemble-team-for-oil-deals/?_r=0 L1 Energy

  20. Existence of Near Isometries • Johnson-LindenstraussLemma • Given a set of Q points, there exists a Lipchitz map that achieves near-isometry (with constant ) provided • Random matrices with iidsubGaussian entries work • c.f. so-called “compressive sensing” • Existence of solution! • but constants are poor • oblivious to data structure [J-L, 84] [Frankl and Meahara, 88][Indyk and Motwani, 99] [Achlioptas, 01][Dasgupta and Gupta, 02]

  21. Near-Isometric Embedding • Q. Can we beat random projections? • A. … • on the one hand: lower bounds for JL [Alon ’03]

  22. Near-Isometric Embedding • Q. Can we beat random projections? • A. … • on the one hand: lower bounds for JL [Alon ’03] • on the other hand: carefully constructed linearprojections can often do better • Our quest: An optimization based approach for learning“good” linear embeddings

  23. Normalized Secants • Normalized pairwise vectors[Whitney; Kirby; Wakin, B ’09] • Goal is to approximately preserve the length of • Obviously, projecting in direction of is a bad idea

  24. Normalized Secants • Normalized pairwise vectors • Goal is to approximately preserve the length of • Note: total number of secants is large:

  25. “Good” Linear Embedding Design • Given: normalized secants • Seek: the “shortest” matrix such that Erratum alert: we will use Qto denote both the number of data points and the number of secants

  26. “Good” Linear Embedding Design • Given: normalized secants • Seek: the “shortest” matrix such that

  27. “Good” Linear Embedding Design • Given: normalized secants • Seek: the “shortest” matrix such that

  28. Lifting Trick • Convert quadratic constraints in into linearconstraints in • After designing , obtain via matrix square root

  29. Relaxation • Convert quadratic constraints in into linearconstraints in Relax rank minimization to nuclear norm minimization

  30. NuMax • Semi-Definite Program (SDP) • Nuclear norm minimization with Max-norm constraints (NuMax) • Solvable by standard interior point techniques • Rank of solution is determined by

  31. Practical Considerations • In practice Nlarge, Q very large! • Computational cost per iterationscales as

  32. Solving NuMax • Alternating Direction Method of Multipliers (ADMM) • - solve for P using spectral thresholding • - solve for L using least-squares • - solve for q using “clipping” • Computational/memory cost per iteration:

  33. Accelerating NuMax • Poor scaling with N and Q • least squares involves matrices with Q2 rows • SVD of an NxN matrix • Observation 1 • intermediate estimates of P are low-rank • use low-rank representation to reduce memory and accelerate computations • use incremental SVD for faster computations

  34. Accelerating NuMax • Observation 2 • by KKT conditions, by complementary slackness, only constraints that are satisfied with equality determine solutions (“active constraints”) Analogy: Recall support vector machines (SVMs)., where we solve The solution is determined only by the support vectors – those for which

  35. NuMax-CG • Observation 2 • by KKT conditions, by complementary slackness, only constraints that are satisfied with equality determine solutions (“active constraints”) • Hence, given feasibility of a solution P*, only secants vkfor which |vkTP*vk– 1| = determine the value of P* • Key: Number of “support secants” << total number of secants • and so we only need to track the support secants • “column generation” approach to solving NuMax

  36. Computation Time Can solve for datasetswith Q=100k points in N=1000 dimensions in a few hours

  37. Squares – Near Isometry • Images of translating blurred squares live on a K=2 dimensional smooth manifold in N=256 dimensional space • Project a collection of these images into M-dimensional space while preserving structure(as measured by isometry constant ) N=16x16=256

  38. Squares – Near Isometry • M=40 linear measurements enough to ensure isometryconstant of = 0.01 N=16x16=256

  39. Squares – Near Isometry

  40. Squares – Near Isometry

  41. Squares – Near Isometry

  42. Squares – Near Isometry

  43. Squares – CS Recovery • Signal recovery in AWGN N=16x16=256

  44. MNIST (8) – Near Isometry N=20x20=400 M = 14 basis functions achieve = 0.05

  45. MNIST (8) – Near Isometry N=20x20=400

  46. MNIST – NN Classification • MNIST dataset • N = 20x20 = 400-dim images • 10 classes: digits 0-9 • Q = 60000 training images • Nearest neighbor (NN) classifier • Test on 10000 images • Miss-classification rate of NN classifier: 3.63%

  47. MNIST – Naïve NuMax Classification • MNIST dataset • N = 20x20 = 400-dim images • 10 classes: digits 0-9 • Q = 60000 training images, so >1.8 billion secants! • NuMax-CG took 3-4 hours to process • Miss-classification rate of NN classifier: 3.63% • NuMax provides the best NN-classification rates

  48. Task Adaptivity • Prune the secants according to the task at hand • If goal is signal reconstruction, then preserve all secants • If goal is signal classification, then preserve inter-class secants differently from intra-class secants • Can preferentially weight the training set vectors according to their importance (connections with boosting)

More Related