170 likes | 190 Views
Explore dimension reduction techniques for LP spaces (1<p<2) through Johnson-Lindenstrauss Lemma and JL transformation. This paper delves into single-scale and snowflake embeddings, showcasing proofs, observations, and applications for clustering. Discover the challenges and potential solutions, such as bounded-range dimension reduction and snowflake embeddings, with a focus on LP spaces between 1 and 2. Unravel the complexities and possibilities of LP space dimension reduction.
E N D
Dimension reduction techniques for lp (1<p<2), with applications YairBartal Lee-Ad Gottlieb Hebrew U. Ariel University
Introduction • Fundamental result in dimension reduction: Johnson-Lindenstrauss Lemma (JL-84) for Euclidean space. • Given: set S of n points in Rd • There exists: • ƒ : Rd → Rk k = O( ln(n) / ε 2 ) • for all u,v in S, ||u-v||2 ≤ ||f(u)-f(v)||2 ≤ (1+ε)||u-v||2
Introduction • JL Lemma is specific to l2. • Dimension reduction for other lp spaces? • Impossible for l and l1. • Not known for other lpspaces. • This paper: • Dimension reduction techniques for lp (1<p<2) • Specifically, single scale and snowflake embeddings
JL transform • Given: set S of n points in Rd • There exists: • ƒ : Rd → Rk k = O( ln(n) / ε 2 ) • for all u,v in S, ||u-v||2 ≤ ||f(u)-f(v)||2 ≤ (1+ε)||u-v||2
JL transform • Proof by (randomized) construction • f : Rd → Rk: multiply vectors by random d x k matrix • Matrix entries can be {-1,1} or Gaussians =
JL transform • Prove: with constant probability, for all u,v in S • ║u-v║2 ≤ ║f(u)-f(v)║2 ≤ (1+ε) ║u-v║2 • Observation: • f is linear • if w = u-v • f(w) = f(u-v) = f(u)-f(v) • Suffices to prove • ║w║2 ≤ ║f(w)║2 ≤ (1+ε)║w║2
JL transform • Consider an embedding into R1, with G=N(0,1) • Normals are 2-stable: • If: X,Y ~ N(0,1) Then: aX ~ N(0,a2) • Also: aX + bY ~ N(0,a2+b2) ~ √(a2+b2) N(0,1) • So: ∑wigi~ √(∑wi2) N(0,1) = ║w║2N(0,1) = ag1 +bg2 +cg3
JL transform • Even a single coordinate preserves magnitude. • Each coordinateis distributed ~ ║w║2N(0,1) • So (up to scaling) E[║f(w)║2] = ║w║2 • Need this to hold simultaneously for all point pairs • Multiple coordinates: • ║f(w)║22~ ║w║22∑ N2 (0,1) ~ χ2(k) • Sum of k coordinates squared tightly concentrated around its mean • Can demonstrate • When k= ln(n) / ε2 all point pairs preserved simultaneously
Dimension reduction for lp? • JL works well for l2. • Let’s try to do the same thing for lp (1<p<2) • Hint: won’t work… but will be instructive • p-stable distributions: • If: X,Y ~ Fp p≤2 • Then: aX + bY ~ (ap+bp)1/pFp [Johnson-Schechtman 82, Datar-Immorlica-Indyk-Mirrokni 04, Mendel-Naor 04]
Dimension reduction for lp? • Suppose we embedded into R1, with G=Fp • ║f(w)║p distributed as ║w║pFp • So (up to scaling) E[║f(w)║p] = ║w║p • Multiple coordinates from lpinto lp or lq (q≤p) • ║f(w)║pp = ║w║pp∑gp • ║f(w)║pq = ║w║pq∑gq • Looks good! But what’s E[gp] and E[gq]?
p-stable distribution • Familiar examples: • Guassian: 2-stable • Cauchy: 1-stable • Density function • Unimodal [SY-78, Y-78, H-84] • Bell-shaped [G-84] • Heavy-tailed when p<2:h(x) ≈ 1/(1+xp+1) • When • p<2, E[gq] = ∫0∞xqh(x)dx ≈ ∫0∞xq/(1+xp+1) ≈ ∫01xqdx + ∫1∞ xq−(p+1)dx ≈ -x-(p-q) /(p-q) |1∞ • 0<q<p E[gq] ≈ 1/(p-q) ← OK • q≥p E[gq] ≈ ∞ ← Problem
Dimension reduction for lp? • Problems using p-stables for dimension reduction • Heavy tails for p<2 E[gp] • When q<p, E[gq] is finite, but how many coordinates are needed?
Dimension reduction for lp? • What’s known for non-Euclidean space? • For l1 : Bounded range dimension reduction [OR-02] • Dimension: O(R logn / ε3 ) • Distortion: Distances in range [1,R] retained to (1+ε) • Expansion: Distances <1 remain smaller • Contraction: Distances >R remain larger • Used as a subroutine for clustering, ANNS
Dimension reduction for lp? • Our contributions for lp (1<p<2): • Bounded range dimension reduction (lp lqq≤p) • Dimension: Oε(R logn) • Distortion: Distances in [1,R] retained to (1+ε) • Expansion: Distances <1 remain smaller • Contraction: Distances >R remain larger • Snowflake embedding: • ║x-y║p (1ε) ║x-y║pα α ≤ 1 • Dimension: O(ddim2) • Previously known only for l1, with dimension O(22ddim) • Both embeddings have application to clustering.
Single scale dimension reduction • Our single-coordinate embedding is as follows: • f: Rd → R1 • s: upper distance threshold (~ R) • φ: random angle • F(v) = Fφ,s(v) = s sin(φ + (1/s) ∑igivi) • Motivated by [Mendel-Naor 04] • Intuition: sin(ε) ≈ ε • Small values retained • Large values truncated
Single scale dimension reduction • F(v) = Fφ,s(v) = s sin(φ + 1/s ∑igivi) • E[|F(u)-F(v)|q] = sq E[|sin(φ + 1/s ∑igiui) - sin(φ + 1/s ∑Igivi)|q] = c (2s)q E[|sin(1/(2s) ∑igi(ui-vi)) cos(φ + 1/(2s) ∑Igi(ui+vi))|q] = c (2s)q E[|sin(1/(2s) ∑igi(ui-vi))|q] • Multiple dimensions: repeat n=sO(1)logn times, tight bounds using Bernstein’s inequality • Final embedding: • Threshold: ║F(u)-F(v) ║q = O(s) • Distortion: when 1<w < εs ║F(u)-F(v) ║q ≈ ║(1+ε)u-v║p • Expansion: when w < 1 ║F(u)-F(v) ║q< ║(1+ε)u-v║p
Snowflake embedding • Snowflake embedding is created by concatenating many single-scale embeddings • An idea due to Assouad (84) • Need many properties of single scale: threshold, smoothness, fidelity. • Thank you!