1 / 71

Learning Near-Isometric Linear Embeddings

Learning Near-Isometric Linear Embeddings. Chinmay Hegde MIT Aswin Sankaranarayanan CMU Wotao Yin UCLA. Richard Baraniuk Rice University. c hallenge 1 too m uch d ata. Large Scale Datasets. Case in Point: DARPA ARGUS-IS. 1.8 Gigapixel image sensor.

amma
Download Presentation

Learning Near-Isometric Linear Embeddings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Near-Isometric Linear Embeddings ChinmayHegde MIT AswinSankaranarayanan CMU Wotao Yin UCLA Richard Baraniuk Rice University

  2. challenge 1too much data

  3. Large Scale Datasets

  4. Case in Point: DARPA ARGUS-IS • 1.8 Gigapixelimage sensor

  5. Case in Point: DARPA ARGUS-IS • 1.8 Gpixel image sensor • video rate output: 444 Gbits/s • comm data rate: 274 Mbits/sfactor of 1600x way out of reach ofexisting compressiontechnology • Reconnaissancewithout conscience • too much data to transmit to a ground station • too much data to make effective real-time decisions

  6. challenge 2data too expensive

  7. Case in Point: MR Imaging • Measurements very expensive • $1-3 million per machine • 30 minutes per scan

  8. Case in Point: IR Imaging

  9. DIMENSIONALITYREDUCTION

  10. Intrinsic Dimensionality • Why? Geometry, that’s why • Exploit to perform more efficientanalysis and processing of large-scale data Intrinsic dimension << Extrinsic dimension!

  11. Linear Dimensionality Reduction measurements signal

  12. Linear Dimensionality Reduction Goal: Create a (linear) mapping from RN to RM with M < N that preserves the key geometric properties of the data ex: configuration of the data points

  13. Dimensionality Reduction • Given a training set of signals, find “best” that preserves its geometry

  14. Dimensionality Reduction • Given a training set of signals, find “best” that preserves its geometry • Approach 1: Principal Component Analysis (PCA) via SVD of training signals • find “average” best fitting subspace in least-squares sense • average error metric can distortpoint cloud geometry

  15. Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by Restricted Isometry Property (RIP) Whitney Embedding Theorem

  16. Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIPand Whitney • design to preserve inter-point distances (secants) • more faithful to training data

  17. Near-Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIPand Whitney • design to preserve inter-point distances (secants) • more faithful to training data • but exact isometry can be too much to ask

  18. Near-Isometric Embedding • Given a training set of signals, find “best” that preserves its geometry • Approach 2: Inspired by RIP and Whitney • design to preserve inter-point distances (secants) • more faithful to training data • but exact isometry can be too much to ask

  19. Why Near-Isometry? • Sensing • guarantees existence of a recoveryalgorithm • Machine learning applications • kernelmatrix depends only on pairwise distances • Approximate nearest neighbors for classification • efficient dimensionality reduction

  20. Existence of Near Isometries • Johnson-LindenstraussLemma • Given a set of Q points, there exists a Lipchitz map that achieves near-isometry (with constant ) provided • Random matrices with iidsubGaussian entries work • compressive sensing, locality sensitive hashing, database monitoring, cryptography • Existence of solution! • but constants are poor • oblivious to data structure [J-L, 84] [Frankl and Meahara, 88][Indyk and Motwani, 99] [Achlioptas, 01][Dasgupta and Gupta, 02]

  21. Designed Embeddings • Unfortunately, random projections are data-oblivious (by definition) • Q: Can we beat random projections? • Our quest: A newapproach for designinglinear embeddings for specific datasets

  22. [math alert]

  23. Designing Embeddings • Normalized secants [Whitney; Kirby; Wakin, B ’09] • Goal: approximately preserve the length of • Obviously, projecting in direction of is a bad idea

  24. Designing Embeddings • Normalized secants • Goal: approximately preserve the length of • Note: total number of secants is large:

  25. “Good” Linear Embedding Design • Given: normalized secants • Seek: the “shortest” matrix such that • Think of as the knob that controls the “maximum distortion” that you are willing to tolerate

  26. “Good” Linear Embedding Design • Given: (normalized) secants • Seek: the “shortest” matrix such that

  27. Lifting Trick • Convert quadratic constraints in into linearconstraints in • Given , obtain via matrix square root

  28. Relaxation • Relax rank minimization problem to nuclear norm minimization problem

  29. NuMax • Nuclear norm minimization with Max-norm constraints (NuMax) • Semi-Definite Program (SDP) • solvable by standard interior point methods • Rank of solution is determined by

  30. Accelerating NuMax • Poor scaling with N and S • least squares involves matrices with Srows • SVD of an NxNmatrix • Several avenues to accelerate: • Alternating Direction Method of Multipliers (ADMM) • exploit fact that intermediate estimates of P are low-rank • exploit fact that only a few secants define the optimal embedding (“column generation”)

  31. Accelerated NuMax Can solve for datasetswith Q=100k points in N=1000 dimensions in a few hours

  32. [/math alert]

  33. App: Linear Compression • Images of translating blurred squares live on a K=2 dimensional smooth “surface” (manifold) in N=256 dimensional space • Project a collection of 1000 such images into M-dimensional space while preserving structure(as measured by distortion constant ) N=16x16=256

  34. Rows of “Optimal” measurements signal N=16x16=256

  35. Rows of “Optimal”

  36. Rows of “Optimal”

  37. Rows of “Optimal”

  38. App: Linear Compression • M=40 linear measurements enough to ensure isometryconstant of = 0.01

  39. Secant Distortion • Distribution of secant distortions for the translating squares dataset • Embedding dimension M=30 • Input distortion to NuMax is \delta=0.03 • As opposed to PCA and random, NuMax provides distortions sharply concentrated at \delta.

  40. Secant Distortion • Translating squares dataset • N = 16x16 = 256 • M = 30 • = 0.03 • Histograms of normalized secant distortions random PCA NuMax 0.06 0.06 0.06

  41. MNIST (8) – Near Isometry N=20x20=400 M = 14 basis functions achieve = 0.05

  42. MNIST (8) – Near Isometry N=20x20=400

  43. App: Image Retrieval Goal: Preserve neighborhood structure of a set ofimages • N = 512, Q = 4000, M = 45 suffices to preserve 80% of neighborhoods LabelMeImage Dataset

  44. App: Classification • MNIST digits dataset • N = 20x20 = 400-dim images • 10 classes: digits 0-9 • Q = 60000 training images • Nearest neighbor (NN) classifier • Test on 10000 images • Mis-classification rate of NN classifier using original dataset: 3.63%

  45. App: Classification • MNIST dataset • N = 20x20 = 400-dim images • 10 classes: digits 0-9 • Q = 60000 training images, so S = 1.8 billion secants! • NuMax-CG took 3 hours to process • Mis-classification rate of NN classifier: 3.63% • NuMax provides the best NN-classification rates!

  46. NuMax and Task Adaptivity • Prune the secants according to the task at hand • If goal is reconstruction / retrieval, then preserve all secants • If goal is signal classification, then preserve inter-class secants differently from intra-class secants • This preferential weighting approach is akin to “boosting”

  47. Optimized Classification Inter-class secants are notshrunk Intra-class secants are notexpanded This simple modification improves NN classification rateswhile using even fewer measurements

  48. Optimized Classification • MNIST dataset • N = 20x20 = 400-dim images • 10 classes: digits 0-9 • Q = 60000 training images, so >1.8 billion secants! • NuMax-CG took 3-4 hours to process Significant reduction in number of measurements (M) Significant improvement in classification rate

More Related