1 / 40

Dimension Reduction in the L 1 norm

Dimension Reduction in the L 1 norm. Moses Charikar Princeton University Joint work with Amit Sahai Princeton University. Low Distortion Embeddings. f. Given Metric Spaces ( M 1 , d 1 ) & ( M 2 , d 2 ) , embedding f : M 1  M 2 has distortion  if  x , y  M 1 :

osgood
Download Presentation

Dimension Reduction in the L 1 norm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dimension Reduction in the L1 norm Moses Charikar Princeton University Joint work with Amit SahaiPrinceton University

  2. Low Distortion Embeddings f • Given Metric Spaces (M1,d1) & (M2,d2), embedding f: M1 M2 has distortion  if  x,y M1: • Our Focus: “Dimension Reduction” – Make target space be of “low” dimension, while maintaining small distortion (M2,d2) (M1,d1)

  3. Motivation • Overall Idea: Move data from “High Complexity” Spaces to “Low Complexity” Spaces. • Used in many contexts to develop efficient algorithms: • Approximate Nearest Neighbor search [IM, KOR, Ind] • Clustering of high-dimensional point sets [Das, OR] • Streaming Computation [AMS, Ind] • … (See Indyk Survey ’01) • Dimensionality reduction in “standard” normed spaces: beautiful mathematical problem.

  4. Johnson-Lindenstrauss [JL84] • Fundamental result in dimensionality reduction for L2 norm: • n points in Euclidean space (L2 norm) can be mapped down to O((log n)/e2) dimensions with distortion at most 1+e. • Moreover, embedding enjoys two interesting properties: • Linear mapping • Oblivious – choice of linear mapping does not depend on point set • Quite simple [JL84, FM88, IM98, DG99, Ach01]: Even a random +1/-1 matrix works… • Many applications…

  5. Dimension Reduction for L1? • [Indyk, FOCS ’01 tutorial]“Is there an analog of JL lemma for other norms, especially L1? This would give a powerful technique for designing approximation algorithms for L1 norms …” • [Linial, ICM ’02]“We know much less about metric embeddings into L1, and the attempts to understand them give rise to many intriguing open problems … What is the smallest k=k(n,) so that every n-point metric in L1 can be embedded intoL1kwith distortion < 1+ ?We know very little at the moment, namely  (log n)  k  O(n log n) for constant  > 0. The lower bound is trivial …”

  6. Dimension Reduction for L1? • Basic Question: Does there exist an analogue of the Johnson-Lindenstrauss Lemma for L1? • Little is known: • “Generic” Dimension Reduction via Bourgain’s theorem: • [Bourgain] any metric embeds in L2 with distortion O(log n) • Embed L2 into L1 by projecting onto random unit vectors • O(log n) distortion with O(log n) dimensions • Limit:  n points in L1 s.t. any embedding to L2 incurs distortion ((log n)1/2)[Matousek]

  7. Dimension Reduction for L1?(cont.) • Dimension reduction result for L1 of a different flavor due to [Indyk00]: •  embedding into (log 1/d)O(1/e) dimensions s.t.: • distances don’t decrease by more than (1+e) w.p. 1-d • distances don’t increase w.p. e(distances can increase arbitrarily w.p. 1-e) • Dimension reduction result for L1 of yet a different flavor due to [KOR00]: • For Hamming Cube: low dimensional embedding for distinguishing between 2 distance thresholds

  8. Our Results • Lower Bound on Linear Embeddings for L1: There exists set of O(n) points in n-dimensional L1 space such that any linear embedding to d-dimensions incurs distortion . • Lower bound holds for non-oblivious embeddings. • Tight for linear embeddings up to a log factor. • Any embedding achieving Bourgain’s Theorem (distortion O(log n)) must be non-linear. • Idea: Analysis of small stretch embeddings • Tool: Proof technique for lower bounds on minimum distortion of low dimensional embeddings

  9. Our Results (cont.) • General classes of L1-embeddable metrics admitting low-dimensional, small-distortion embeddings: • For metrics on trees and K2,3-free graphs, we embed into log2(n) dimensions, with distortion 1+e and constant, respectively. • For L1 metrics that are “circular-decomposable,” we embed into log2(n) dimensions, with distortion 3+e. • Tool: Dimension Reduction for “Small-Support” Metrics.

  10. Lower Bound for Linear Embeddings • Essence of Construction: • 2n+1 points in n dimensions: • First, origin O = (0,0,0, …, 0) • Standard basis vectors: P1 … Pn(1,0,0, …, 0), (0,1,0, …, 0), …, (0,0,0, …, 1) • n Random +1 / -1 vectors: Q1, …, Qn(1,-1,-1, …, 1), … • Let f be linear embedding into d dimensions: • Consider it as sequence of linear embeddings onto line: • 1, …, d: each a linear map onto the line

  11. Lower Bound (Intuition) • Origin O Standard basis vectors: P1 … Pnn Random +1 / -1 vectors: Q1, …, Qn1, …, d: each a linear map onto the line • By linearity: (Qi) = (P1)  (P2)  (P3)  …  (Pn) • But by non-expansion, |(Pi)|  1 for all i,j • So expect |j(Qi)|  n1/2for each j • Expect d(O,Qi)  d n1/2 in d-dimensional projection • Distortion lower bound of n1/2/ d • Note argument is weak because |j(Pi)| can’t be close to 1 for all j – this would contradict non-expansion of d(O,Pi)

  12. Lower Bound (Intuition, cont.) • Origin O Standard basis vectors: P1 … Pnn Random +1 / -1 vectors: Q1, …, Qn1, …, d: each a linear map onto the line • To improve bound, must better exploit tension between: • Keeping d(O,Pi) small • Making d(O,Qi) large • Idea: Use Ratio (Average d(O,Qi) )/(Average d(O,Pi) )to bound distortion. • Formalize using LP duality.

  13. Lower Bound:The LP • Distances in L1 computed by adding up distances in each of d dimensions. • Pretend each dimension scaled by d factor, and distance is computed by averaging over dimensions. • Before scaling, non-expansion implied distance never increased in any dimension. After, implies distance never increases by more than factor d in any dimension. (*)(small stretch embedding) • Express relaxation of problem of embedding in d dimensions with min distortion as LP: • LP allowed to average over embeddings of type (*)with arbitrary weights in [0,1].

  14. Lower Bound:The LP (cont.)

  15. Lower Bound:The Dual LP • Take the dual: • Can rewrite this as…

  16. Edges that tend to expand Edges that tend to contract Lower Bound: Dual LP • Any feasible solution to the dual gives lower bound on distortion. • Note uv are slack variables for “non-expansion”uvare slack variables for “low distortion” linear small stretch

  17. Lower Bound Feasible Sol’n • Must give point set (duv) and feasible z, uv and uv • Point set is as before, with slight refinement: • Origin O • Standard basis vectors: P1 … Pn • +1 / -1 vectors: Q1, …, Qm s.t. coordinates of randomly chosen Qi form a pairwise-independent distribution. Note that m=O(n) points suffice.

  18. Lower Bound Feasible Soln… • Now we claim that the following is a feasible solution: • Calculations show this would yield lower bound on distortion of exactly (n/d)1/2

  19. Lower Bound:The Dual LP again Sum of distances to Pi Sum of distances to Qi

  20. Lower Bound Feasible Soln… • Consider linear map  = [1, 2, …, n] onto line. • Note that since (O) = 0, (Pi) = i,we have that: |i|  d • Compute Sums/Averages: |(Pi)| =  |i| (obvious) • where the last equality is by pairwise independence

  21. Lower Bound: Finish • Plugging in, find that constraint follows from AMGM inequality. QED

  22. Dimension Reduction for L1? • Lower bound shows linear maps can’t provide dimension reduction for L1 in general… • But are there cases where they are still useful? • We identify one important case: Metrics where coordinates of each point have small support i.e. very few coordinates are non-zero. • (Note in our bad set, Q’s had very large support.)

  23. Dimension Reduction forSmall-Support Embeddings • Suppose we have any number of points embedded in d-dimensional L1 space, such that: • Each point has at most k non-zero coordinates. • Then we can embed these points intoO( (k/e)2 log d ) dimensions with distortion 1+e • Key ingredient: Combinatorial Designs. • Linear embeddings: ||f(p)-f(q)||1 = ||f(p-q)||1. • p-q has at most 2k non-zero coordinates.

  24. d Small-Support Embeddings… p-q Original Add subset of coordinates and scale New

  25. d Small-Support Embeddings… p-q Original New

  26. 1 3,5,9 2 4,8 6 7 0 1,2,3,4,6,7,0 5 8 9 Application: Tree Metrics (Here, Improved Result due to Gupta, pers. comm. 02) 1 3 2 = 4 5 6 + 7 9 8 0

  27. Tree Metrics: Spider Decomp. • Can show that every tree decomposes into O(log n) Spiders (straightforward induction). • Each Spider has trivial embedding in L1, with support size = 1. • Thus, each spider embeds into O((log n)/e2) dimensions. • Yields O((log2 n)/e2) dimensional embedding for Tree Metrics with distortion 1+e

  28. Embeddings into Trees? • Since Tree Metrics can be embedded into L1 with low distortion, does this mean any embedding into distribution of tree metrics yields low-dim L1 embedding? • “Proof:” Sample a few trees from distribution, then use tree embeddings just described. • Problem: High-Variance distances cause too many samples to be needed. • Example: Cycle

  29. Embeddings into Trees?… • Embed Cycle into Trees (Lines) by chopping edges:

  30. Embeddings into Trees?… • Embed Cycle into Trees (Lines) by chopping edges:

  31. Embeddings into Trees?… • Embed Cycle into Trees (Lines) by chopping edges:

  32. Embeddings into Trees?… • Embed Cycle into Trees (Lines) by chopping edges:

  33. Embeddings into Trees?… • Embed Cycle into Trees (Lines) by chopping edges:

  34. Embeddings into Trees?… • Embed Cycle into Trees (Lines) by chopping edges:

  35. Embeddings into Trees?… • Embed Cycle into Trees (Lines) by chopping edges: • Problem: Chopped edges lead to huge variance in distances. Need to sample (n) trees. • In particular, max stretch in each embedding is huge..

  36. Embeddings into Trees?… • Need alternative embeddings into trees which induce small variance in distances (limit stretch). • Example: “Flattening” a Cycle: • Just 2 “flattenings” can give constant distortion approx. to cycle metric E D C A B A E B D C

  37. Outerplanar and K2,3-freeGraphs • Outerplanar graphs (more generally K2,3-free graphs) are basically concatenations of cycles. • [GNRS99] gave constant-distortion embedding of such graphs into (high-variance) distributions of trees • We use “flattening” to adapt construction to yield constant-distortion embedding into just 2 trees. (stretch-limited embeddings) • This yields O(log2(n)) dimensional embedding in L1with constant distortion.

  38. Conclusions • Basic Question Still Open: JL Lemma for L1? • We show: Any such embedding must be non-linear. • Also introduce collection of techniques for building low-dimension, small-distortion embeddings into L1

  39. Future Directions • Is there an oblivious dimension reduction technique for L1 ? • Oblivious embeddings with better guarantees than linear embeddings ? • Does there exist a set of n points in L1 such that any embedding into polylog dimensions incurs distortion more than 1+? • Do trees need (log n) dimensions for distortion 1+ ? • Can circular decomposable metrics and metrics on K2,3 free graphs be embedded in polylog dimensions with distortion 1+? • Is it always possible to embed n points in L1 in polylog dimensions with constant distortion ? • Can we do this for metrics on series-parallel graphs ?

  40. Recent Developments NO! • Is it always possible to embed n points in L1 in polylog dimensions with constant distortion ? • Can we do this for metrics on series-parallel graphs ? [Brinkman, Charikar ’03]There exists a set of n points in L1 such that any embedding with distortion  requires n dimensions; any embedding with distortion1+ requires n1/2.log) dimensions

More Related