Dimension Reduction in the L 1 norm

Dimension Reduction in the L1 norm Moses Charikar Princeton University Joint work with Amit SahaiPrinceton University

Low Distortion Embeddings f • Given Metric Spaces (M1,d1) & (M2,d2), embedding f: M1 M2 has distortion  if  x,y M1: • Our Focus: “Dimension Reduction” – Make target space be of “low” dimension, while maintaining small distortion (M2,d2) (M1,d1)

Motivation • Overall Idea: Move data from “High Complexity” Spaces to “Low Complexity” Spaces. • Used in many contexts to develop efficient algorithms: • Approximate Nearest Neighbor search [IM, KOR, Ind] • Clustering of high-dimensional point sets [Das, OR] • Streaming Computation [AMS, Ind] • … (See Indyk Survey ’01) • Dimensionality reduction in “standard” normed spaces: beautiful mathematical problem.

Johnson-Lindenstrauss [JL84] • Fundamental result in dimensionality reduction for L2 norm: • n points in Euclidean space (L2 norm) can be mapped down to O((log n)/e2) dimensions with distortion at most 1+e. • Moreover, embedding enjoys two interesting properties: • Linear mapping • Oblivious – choice of linear mapping does not depend on point set • Quite simple [JL84, FM88, IM98, DG99, Ach01]: Even a random +1/-1 matrix works… • Many applications…

Dimension Reduction for L1? • [Indyk, FOCS ’01 tutorial]“Is there an analog of JL lemma for other norms, especially L1? This would give a powerful technique for designing approximation algorithms for L1 norms …” • [Linial, ICM ’02]“We know much less about metric embeddings into L1, and the attempts to understand them give rise to many intriguing open problems … What is the smallest k=k(n,) so that every n-point metric in L1 can be embedded intoL1kwith distortion < 1+ ?We know very little at the moment, namely  (log n)  k  O(n log n) for constant  > 0. The lower bound is trivial …”

Dimension Reduction for L1? • Basic Question: Does there exist an analogue of the Johnson-Lindenstrauss Lemma for L1? • Little is known: • “Generic” Dimension Reduction via Bourgain’s theorem: • [Bourgain] any metric embeds in L2 with distortion O(log n) • Embed L2 into L1 by projecting onto random unit vectors • O(log n) distortion with O(log n) dimensions • Limit:  n points in L1 s.t. any embedding to L2 incurs distortion ((log n)1/2)[Matousek]

Dimension Reduction for L1?(cont.) • Dimension reduction result for L1 of a different flavor due to [Indyk00]: •  embedding into (log 1/d)O(1/e) dimensions s.t.: • distances don’t decrease by more than (1+e) w.p. 1-d • distances don’t increase w.p. e(distances can increase arbitrarily w.p. 1-e) • Dimension reduction result for L1 of yet a different flavor due to [KOR00]: • For Hamming Cube: low dimensional embedding for distinguishing between 2 distance thresholds

Our Results • Lower Bound on Linear Embeddings for L1: There exists set of O(n) points in n-dimensional L1 space such that any linear embedding to d-dimensions incurs distortion . • Lower bound holds for non-oblivious embeddings. • Tight for linear embeddings up to a log factor. • Any embedding achieving Bourgain’s Theorem (distortion O(log n)) must be non-linear. • Idea: Analysis of small stretch embeddings • Tool: Proof technique for lower bounds on minimum distortion of low dimensional embeddings

Our Results (cont.) • General classes of L1-embeddable metrics admitting low-dimensional, small-distortion embeddings: • For metrics on trees and K2,3-free graphs, we embed into log2(n) dimensions, with distortion 1+e and constant, respectively. • For L1 metrics that are “circular-decomposable,” we embed into log2(n) dimensions, with distortion 3+e. • Tool: Dimension Reduction for “Small-Support” Metrics.

Lower Bound for Linear Embeddings • Essence of Construction: • 2n+1 points in n dimensions: • First, origin O = (0,0,0, …, 0) • Standard basis vectors: P1 … Pn(1,0,0, …, 0), (0,1,0, …, 0), …, (0,0,0, …, 1) • n Random +1 / -1 vectors: Q1, …, Qn(1,-1,-1, …, 1), … • Let f be linear embedding into d dimensions: • Consider it as sequence of linear embeddings onto line: • 1, …, d: each a linear map onto the line

Lower Bound (Intuition) • Origin O Standard basis vectors: P1 … Pnn Random +1 / -1 vectors: Q1, …, Qn1, …, d: each a linear map onto the line • By linearity: (Qi) = (P1)  (P2)  (P3)  …  (Pn) • But by non-expansion, |(Pi)|  1 for all i,j • So expect |j(Qi)|  n1/2for each j • Expect d(O,Qi)  d n1/2 in d-dimensional projection • Distortion lower bound of n1/2/ d • Note argument is weak because |j(Pi)| can’t be close to 1 for all j – this would contradict non-expansion of d(O,Pi)

Lower Bound (Intuition, cont.) • Origin O Standard basis vectors: P1 … Pnn Random +1 / -1 vectors: Q1, …, Qn1, …, d: each a linear map onto the line • To improve bound, must better exploit tension between: • Keeping d(O,Pi) small • Making d(O,Qi) large • Idea: Use Ratio (Average d(O,Qi) )/(Average d(O,Pi) )to bound distortion. • Formalize using LP duality.

Lower Bound:The LP • Distances in L1 computed by adding up distances in each of d dimensions. • Pretend each dimension scaled by d factor, and distance is computed by averaging over dimensions. • Before scaling, non-expansion implied distance never increased in any dimension. After, implies distance never increases by more than factor d in any dimension. (*)(small stretch embedding) • Express relaxation of problem of embedding in d dimensions with min distortion as LP: • LP allowed to average over embeddings of type (*)with arbitrary weights in [0,1].

Lower Bound:The LP (cont.)

Lower Bound:The Dual LP • Take the dual: • Can rewrite this as…

Edges that tend to expand Edges that tend to contract Lower Bound: Dual LP • Any feasible solution to the dual gives lower bound on distortion. • Note uv are slack variables for “non-expansion”uvare slack variables for “low distortion” linear small stretch

Lower Bound Feasible Sol’n • Must give point set (duv) and feasible z, uv and uv • Point set is as before, with slight refinement: • Origin O • Standard basis vectors: P1 … Pn • +1 / -1 vectors: Q1, …, Qm s.t. coordinates of randomly chosen Qi form a pairwise-independent distribution. Note that m=O(n) points suffice.

Lower Bound Feasible Soln… • Now we claim that the following is a feasible solution: • Calculations show this would yield lower bound on distortion of exactly (n/d)1/2

Lower Bound:The Dual LP again Sum of distances to Pi Sum of distances to Qi

Lower Bound Feasible Soln… • Consider linear map  = [1, 2, …, n] onto line. • Note that since (O) = 0, (Pi) = i,we have that: |i|  d • Compute Sums/Averages: |(Pi)| =  |i| (obvious) • where the last equality is by pairwise independence

Lower Bound: Finish • Plugging in, find that constraint follows from AMGM inequality. QED

Dimension Reduction for L1? • Lower bound shows linear maps can’t provide dimension reduction for L1 in general… • But are there cases where they are still useful? • We identify one important case: Metrics where coordinates of each point have small support i.e. very few coordinates are non-zero. • (Note in our bad set, Q’s had very large support.)

Dimension Reduction forSmall-Support Embeddings • Suppose we have any number of points embedded in d-dimensional L1 space, such that: • Each point has at most k non-zero coordinates. • Then we can embed these points intoO( (k/e)2 log d ) dimensions with distortion 1+e • Key ingredient: Combinatorial Designs. • Linear embeddings: ||f(p)-f(q)||1 = ||f(p-q)||1. • p-q has at most 2k non-zero coordinates.

d Small-Support Embeddings… p-q Original Add subset of coordinates and scale New

d Small-Support Embeddings… p-q Original New

1 3,5,9 2 4,8 6 7 0 1,2,3,4,6,7,0 5 8 9 Application: Tree Metrics (Here, Improved Result due to Gupta, pers. comm. 02) 1 3 2 = 4 5 6 + 7 9 8 0

Tree Metrics: Spider Decomp. • Can show that every tree decomposes into O(log n) Spiders (straightforward induction). • Each Spider has trivial embedding in L1, with support size = 1. • Thus, each spider embeds into O((log n)/e2) dimensions. • Yields O((log2 n)/e2) dimensional embedding for Tree Metrics with distortion 1+e

Embeddings into Trees? • Since Tree Metrics can be embedded into L1 with low distortion, does this mean any embedding into distribution of tree metrics yields low-dim L1 embedding? • “Proof:” Sample a few trees from distribution, then use tree embeddings just described. • Problem: High-Variance distances cause too many samples to be needed. • Example: Cycle

Embeddings into Trees?… • Embed Cycle into Trees (Lines) by chopping edges:

Embeddings into Trees?… • Embed Cycle into Trees (Lines) by chopping edges: • Problem: Chopped edges lead to huge variance in distances. Need to sample (n) trees. • In particular, max stretch in each embedding is huge..

Embeddings into Trees?… • Need alternative embeddings into trees which induce small variance in distances (limit stretch). • Example: “Flattening” a Cycle: • Just 2 “flattenings” can give constant distortion approx. to cycle metric E D C A B A E B D C

Outerplanar and K2,3-freeGraphs • Outerplanar graphs (more generally K2,3-free graphs) are basically concatenations of cycles. • [GNRS99] gave constant-distortion embedding of such graphs into (high-variance) distributions of trees • We use “flattening” to adapt construction to yield constant-distortion embedding into just 2 trees. (stretch-limited embeddings) • This yields O(log2(n)) dimensional embedding in L1with constant distortion.

Conclusions • Basic Question Still Open: JL Lemma for L1? • We show: Any such embedding must be non-linear. • Also introduce collection of techniques for building low-dimension, small-distortion embeddings into L1

Future Directions • Is there an oblivious dimension reduction technique for L1 ? • Oblivious embeddings with better guarantees than linear embeddings ? • Does there exist a set of n points in L1 such that any embedding into polylog dimensions incurs distortion more than 1+? • Do trees need (log n) dimensions for distortion 1+ ? • Can circular decomposable metrics and metrics on K2,3 free graphs be embedded in polylog dimensions with distortion 1+? • Is it always possible to embed n points in L1 in polylog dimensions with constant distortion ? • Can we do this for metrics on series-parallel graphs ?

Recent Developments NO! • Is it always possible to embed n points in L1 in polylog dimensions with constant distortion ? • Can we do this for metrics on series-parallel graphs ? [Brinkman, Charikar ’03]There exists a set of n points in L1 such that any embedding with distortion  requires n dimensions; any embedding with distortion1+ requires n1/2.log) dimensions

Dimension Reduction in the L 1 norm