1 / 30

A Nonlinear Approach to Dimension Reduction

A Nonlinear Approach to Dimension Reduction. Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A. Data As High-Dimensional Vectors.

mason
Download Presentation

A Nonlinear Approach to Dimension Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Nonlinear Approach to Dimension Reduction Lee-Ad Gottlieb Weizmann Institute of Science Joint work with Robert Krauthgamer TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

  2. Data As High-Dimensional Vectors • Data is often represented by vectors in Rm • For images, color or intensity • For document, word frequency • A typical goal – Nearest Neighbor Search: • Preprocess data, so that given a query vector, quickly find closest vector in data set. • Common in various data analysis tasks – classification, learning, clustering. A Nonlinear Approach to Dimension Reduction

  3. Curse of Dimensionality • Cost of many useful operations is exponential in dimension • First noted by Bellman (Bel-61) in the context of PDFs • Nearest Neighbor Search (Cla-94) • Dimension reduction: • Represent high-dimensional data in a low-dimensional space • Specifically: Map given vectors into a low-dimensional space, while preservingmost of the data’s “structure” • Trade-off accuracy for computational efficiency A Nonlinear Approach to Dimension Reduction

  4. The JL Lemma • Theorem (Johnson-Lindenstrauss, 1984): • For every n-point Euclidean set X, with dimension d, there is a linear map : XY (Euclidean Y) with • Interpoint distortion 1± • Dimension ofY: k = O(--2 log n) • Can be realized by a trivial linear transformation • Multiply d x n point matrix by a k x d matrix of random entries {-1,0,1} [Ach-01] • An near matching lower bound was given by [Alon-03] • Applications in a host of problems in computational geometry • But can we do better? A Nonlinear Approach to Dimension Reduction

  5. Doubling Dimension • Definition: Ball B(x,r) = all points within distance r from x. • The doubling constant(of a metric M) is the minimum value ¸>0such that every ball can be covered by ¸balls of half the radius • First used by [Ass-83], algorithmically by [Cla-97]. • The doubling dimension is dim(M)=log ¸(M) [GKL-03] • Applications: • Approximate nearest neighbor search [KL-04,CG-06] • Distance oracles [HM-06] • Spanners [GR-08a,GR-08b] • Embeddings [ABN-08,BRS-07] Here ≤7. A Nonlinear Approach to Dimension Reduction

  6. The JL Lemma • Theorem (Johnson-Lindenstrauss, 1984): • For every n-point Euclidean set X, with dimension d, there is a linear map : XY with • Interpoint distortion 1± • Dimension ofY: O(-2 log n) • An almost matching lower bound was given by [Alon-03] • This lower bound considered n roughly equidistant points • So it had dim(X) = log n • So in fact the lower bound is (-2 dim(X)) A Nonlinear Approach to Dimension Reduction

  7. A stronger version of JL? • Open questions: • Can the JL log n lower bound be strengthened to apply to spaces with low doubling dimension? (dim(X) << log n) • Does there exist a JL-like embedding into O(dim(X)) dimensions? [LP-01,GKL-03] • Even constant distortion would be interesting • A linear transformation cannot attain this result [IN-07] • Here, we present a partial resolution to these questions: • Two embeddings that use Õ(dim2(X)) dimensions • Result I: (1±) embedding for a single scale, interpoint distances close to some r. • Result II: (1±) global embedding into the snowflake metric, where every interpoint distance s is replaced by s½ A Nonlinear Approach to Dimension Reduction

  8. Result I – Embedding for Single Scale • Theorem 1 [GK-09]: • Fix scale r>0 and range 0<<1. • Every finite X½l2 admits embedding f:Xl2k for k=Õ(log(1/)(dim X)2), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y2X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y||2 [r, r] 3. Boundedness: ||f(x)|| ≤ r for all x2X • We’ll illustrate the proof for constant range and distortion. A Nonlinear Approach to Dimension Reduction

  9. Result I: The construction • We begin by considering the entire point set. Take for example scale r=20 range = ½ • Assume minimum interpoint distance 1 distance: 1 A Nonlinear Approach to Dimension Reduction

  10. Step 1: Net extraction • From the point set, we extract a net • For example, a 4-net • Net properties: • Covering • Packing • A consequence of the packing property is that a ball of radius s contains O(sdim(X)) points Covering radius: 4 Packing distance: 4 A Nonlinear Approach to Dimension Reduction

  11. Step 1: Net extraction • We want a good embedding for just the net points • From here on, our embedding will ignore non-net points • Why is this valid? A Nonlinear Approach to Dimension Reduction

  12. Step 1: Net extraction • Kirszbraun theorem (Lipschitz extension, 1934): • Given an embedding f : XY , X½S(Euclidean space) • there exists a extensionf ’:S Y • The restriction of f ’ to X is equal to f • f ’ is contractive for S \ X • Therefore, a good embedding just for the net points suffices • Smaller net radius less distortion for the non-net points f ’ 20 20 A Nonlinear Approach to Dimension Reduction

  13. Step 2: Padded decomposition • Decompose the space into probabilistic padded clusters A Nonlinear Approach to Dimension Reduction

  14. Step 2: Padded decomposition • Decompose the space into probabilistic padded clusters • Cluster properties for a given random partition [GKL03,ABN08]: • Diameter: bounded by 20 dim(X) • Size: By the doubling property, bounded (20 dim(X))dim(X) • Padding: A point is 20-padded with probability 1-c, say 9/10 • Support: O(dim(X)) partitions Padded ≤ 20 dim(X) A Nonlinear Approach to Dimension Reduction

  15. Step 3: JL on individual clusters • For each partition, consider each individual cluster A Nonlinear Approach to Dimension Reduction

  16. Step 3: JL on individual clusters • For each partition, consider each individual cluster • Reduce dimension using JL-Lemma • Constant distortion • Target dimension: • logarithimic in size: O(log(20 dim(X))dim(X)) = Õ(dim(X)) • Then translate some point to the origin JL A Nonlinear Approach to Dimension Reduction

  17. The story so far… • To review • Step 1: Extract net points • Step 2: Build family of partitions • Step 3: For each partition, apply JL to each cluster, and translate a cluster point to the origin • Embedding guarantees for a singe partition • Intracluster distance: Constant distortion • Intercluster distance: • Min distance: 0 • Max distance: 20 dim(X) • Not good enough • Let’s backtrack… A Nonlinear Approach to Dimension Reduction

  18. The story so far… • To review • Step 1: Extract net points • Step 2: Build family of partitions • Step 3: For each partition, apply Gaussian transform to each cluster • Step 4: For each partition, apply JL to each cluster, and translate a cluster point to the origin • Embedding guarantees for a singe partition • Intracluster distance: Constant distortion • Intercluster distance: • Min distance: 0 • Max distance: 20 dim(X) • Not good enough • Let’s backtrack… A Nonlinear Approach to Dimension Reduction

  19. Step 3: Gaussian transform • For each partition, apply the Gaussian transform to distances within each cluster (Schoenberg’s theorem, 1938) • f(t) = (1-e-t2)1/2 • Threshold at s: fs(t) = s(1-e-t2/s2)1/2 • Properties for s=20: • Threshold: Cluster diameter is at most 20 (Instead of 20dim(X)) • Distortion: Small distortion of distances in relevant range • Transform can increase dimension… but JL is the next step A Nonlinear Approach to Dimension Reduction

  20. Step 4: JL on individual cluster • Steps 3 & 4: • New embedding guarantees • Intracluster: Constant distortion • Intercluster: • Min distance: 0 • Max distance: 20 (instead of 20dim(X)) • Caveat: Also smooth the edges Gaussian JL smaller diameter smaller dimension A Nonlinear Approach to Dimension Reduction

  21. Step 5: Glue partitions • We have an embedding for a single partition • For padded points, the guarantees are perfect • For non-padded points, the guarantees are weak • “Glue” together embeddings for all dim(X) partitions • Concatenate images (and scale down) • Non-padded case occurs 1/10 of the time, so it gets “averaged away” • Final dimension for non-net points: • Number of partitions: O(dim(X)) • dimension of each embedding: Õ(dim(X)) • = Õ (dim2(X)) f1(x) = (1,7,2), f2(x) = (5,2,3), f3(x) = (4,8,5) F(x) = f1(x)  f2(x)  f3(x) = (1,7,2,5,2,3,4,8,5) A Nonlinear Approach to Dimension Reduction

  22. Step 6: Kirszbraun extension theorem • Kirszbraun’s theorem extends embedding to non-net points within increasing dimension Embedding Embedding + K. A Nonlinear Approach to Dimension Reduction

  23. Result I – Review • Steps: • Net extraction • Padded Decomposition • Gaussian Transform • JL • Glue partitions • Extension theorem • Theorem 1 [GK-09]: • Every finite X½l2 admits embedding f:Xl2k for k=Õ((dim X)2), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y2X 2. Bi-Lipschitz at scale r: ||f(x)-f(y)|| ≥ (||x-y||) whenever ||x-y||2 [r, r] 3. Boundedness: ||f(x)|| ≤ r for all x2X A Nonlinear Approach to Dimension Reduction

  24. Result I – Extension • Steps: • Net extraction  nets • Padded Decomposition Larger padding, prob. guarantees • Gaussian Transform • JL Already (1±) • Glue partitions Higher percentage of padded points • Extension theorem • Theorem 1 [GK-09]: • Every finite X½l2 admits embedding f:Xl2k for k=Õ((dim X)2), such that 1. Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| for all x,y2X 2. Gaussian at scale r: ||f(x)-f(y)|| ≥(1±)G(||x-y||) whenever ||x-y||2 [r, r] 3. Boundedness: ||f(x)|| ≤ r for all x2X A Nonlinear Approach to Dimension Reduction

  25. Result II – Snowflake Embedding • Theorem 2 [GK-09]: • For 0<<1, every finite subset X½l2 admits an embedding F:Xl2k for k=Õ(-4(dim X)2) with distortion (1±) to the snowflake: s s½ • We’ll illustrate the construction for constant distortion. • The constant distortion construction is due to [Asouad-83] (for non-Euclidean metrics) • In the paper, we implement the same construction with (1±) distortion A Nonlinear Approach to Dimension Reduction

  26. Snowflake embedding • Basic idea. • Fix points x,y 2X, and suppose ||x-y|| ~ s • Now consider many single scale embeddings • r = 16s • r = 8s • r = 4s • r = 2s • r = s • r = s/2 • r = s/4 • r = s/8 • r = s/16 y x Lipschitz: ||f(x)-f(y)|| ≤ ||x-y|| Gaussian: ||f(x)-f(y)|| ≥(1±)G(||x-y||) Boundedness: ||f(x)|| ≤ r A Nonlinear Approach to Dimension Reduction

  27. Snowflake embedding • Now scale down each embedding by r½ (snowflake) • r = 16s s  s½/4 • r = 8s s  s½/8½ • r = 4s s  s½/2 • r = 2s s  s½/2½ • r = s s  s½ • r = s/2 s/2  s½/2½ • r = s/4 s/4  s½/2 • r = s/8 s/8  s½/8½ • r = s/16 s/16  s½/4 A Nonlinear Approach to Dimension Reduction

  28. Snowflake embedding • Join levels by concatenation and addition of coordinates • r = 16s s  s½/4 • r = 8s s  s½/8½ • r = 4s s  s½/2 • r = 2s s  s½/2½ • r = s s  s½ • r = s/2 s/2  s½/2½ • r = s/4 s/4  s½/2 • r = s/8 s/8  s½/8½ • r = s/16 s/16  s½/4 A Nonlinear Approach to Dimension Reduction

  29. Result II – Review • Steps: • Take collection of single scale embeddings • Scale embedding r by r½ • Join embeddings by concatenation and addition • By taking more refined scales (jump by 1± instead of 2), can achieve (1±) distortion to the snowflake • Theorem 2 [GK-09]: • For 0<<1, every finite subset X½l2 admits an embedding F:Xl2k for k=Õ(-4(dim X)2) with distortion (1±) to the snowflake: s s½ A Nonlinear Approach to Dimension Reduction

  30. Conclusion • Gave two (1±) distortion low-dimension embeddings for doubling spaces • Single scale • Snowflake • This framework can be extended to L1 and L∞ • Dimension reduction: Can’t use JL • Extension: Can’t use Kirszbraun • Threshold: Can’t use the Gaussian • Thank you! A Nonlinear Approach to Dimension Reduction

More Related