1 / 25

Summer School on Hashing’14 Dimension Reduction

Summer School on Hashing’14 Dimension Reduction. Alex Andoni ( Microsoft Research). Nearest Neighbor Search (NNS). Preprocess: a set of points Query: given a query point , report a point with the smallest distance to. Motivation. Generic setup: Points model objects (e.g. images)

Download Presentation

Summer School on Hashing’14 Dimension Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summer School on Hashing’14Dimension Reduction Alex Andoni (Microsoft Research)

  2. Nearest Neighbor Search (NNS) • Preprocess: a set of points • Query: given a query point , report a point with the smallest distance to

  3. Motivation • Generic setup: • Points model objects (e.g. images) • Distance models (dis)similarity measure • Application areas: • machine learning: k-NN rule • speech/image/video/music recognition, vector quantization, bioinformatics, etc… • Distance can be: • Hamming, Euclidean, edit distance, Earth-mover distance, etc… • Primitive for other problems: • find the similar pairs in a set D, clustering… 000000 011100 010100 000100 010100 011111 000000 001100 000100 000100 110100 111111

  4. 2D case • Compute Voronoi diagram • Given query , perform point location • Performance: • Space: • Query time:

  5. High-dimensional case • All exact algorithms degrade rapidly with the dimension

  6. Dimension Reduction • If high dimension is an issue, reduce it?! • “flatten” dimension into dimension • Not possible in general: packing bound • But can if: for a fixed subset of • Johnson Lindenstrauss Lemma [JL’84] • Application: NNS in • Trivial scan: query time • Reduce to time if preprocess, where time to reduce dimension of the query point

  7. Johnson Lindenstrauss Lemma • There is a randomized linear map , , that preserves distance between two vectors • up to factor: • with probability ( some constant) • Preserves distances among points for • Time to compute map:

  8. Idea: • Project onto a random subspace of dimension !

  9. 1D embedding pdf = • How about one dimension () ? • Map • , • where are iid normal (Gaussian) random variable • Why Gaussian? • Stability property: is distributed as , where is also Gaussian • Equivalently: is centrally distributed, i.e., has random direction, and projection on random direction depends only on length of

  10. 1D embedding pdf = • Map , • for any , • Linear: • Want: • Ok to consider since linear • Claim: for any , we have • Expectation: • Standard deviation: • Proof: • Expectation 2 2

  11. Full Dimension Reduction • Just repeat the 1D embedding for times! • where is matrix of Gaussian random variables • Again, want to prove: • for fixed • with probability

  12. Concentration • is distributed as • where each is distributed as Gaussian • Norm • is called chi-squared distribution with degrees • Fact: chi-squared very well concentrated: • Equal to with probability • Akin to central limit theorem

  13. Johnson Lindenstrauss: wrap-up • with high probability • Beyond Gaussians: • Can use instead of Gaussians [AMS’96, Ach’01, TZ’04…]

  14. Faster JL ? • Time: • To compute • time ? • Yes! • [AC’06, AL’08’11, DKS’10, KN’12…] • Will show: time [Ailon-Chazelle’06]

  15. Fast JL Transform • Costly because is dense • Meta-approach: use matrix ? • Suppose sample entries/row • Analysis of one row: • s.t.with probability • Expectation of : • What about variance?? Set normalization constant

  16. Fast JLT: sparse projection • Variance of can be large  • Bad case: is sparse • think: • Even for (each coordinate of goes somewhere) • two coordinates collide (bad) with probability ~ • want exponential in failure probability • really would need • But, take away: may work if is “spread around” • New plan: • “spread around” • use sparse

  17. FJLT: full • = matrix with r.v. on diagonal • = Hadamard matrix: • can be computed in time • composed of only • = sparse matrix as before, size , with “spreading around” Projection: sparse matrix Hadamard (Fast Fourier Transform) Diagonal

  18. Spreading around: intuition • Idea for Hadamard/Fourier Transform: • “Uncertainty principle”: if the original is sparse, then the transform is dense! • Though can “break” ’s that are already dense “spreading around” Projection: sparse matrix Hadamard (Fast Fourier Transform) Diagonal

  19. Spreading around: proof • Suppose • Ideal spreading around: • Lemma: with probability at least , for each coordinate • Proof: • for some vector • Hence is approx. gaussian • Hence

  20. Why projection ? • Why aren’t we done? • choose first few coordinates of • each has same distribution: gaussian • Issue: are not independent • Nevertheless: • since is a change of basis (rotation in )

  21. Projection • Have: • with probability • = projection onto just random coordinates! • Proof: standard concentration • Chernoff: enough to sample terms for approximation • Hence suffices

  22. FJLT: wrap-up • Obtain: • with probability at least • dimension of is • time: • Dimension not optimal: apply regular (dense) JL on to reduce further to • Final time: • [AC’06, AL’08’11, WDLSA’09, DKS’10, KN’12, BN, NPW’14]

  23. Dimension Reduction: beyond Euclidean • Johnson-Lindenstrauss: for Euclidean space • dimension, oblivious • Other norms, such as ? • Essentially no: [CS’02, BC’03, LN’04, JN’10…] • For points, approximation: between and [BC03, NR10, ANN10…] • even if map depends on the dataset! • But can do weak dimension reduction

  24. Towards dimension reduction for • Can we do the “analog” of Euclidean projections? • For , we used: Gaussian distribution • has stability property: • is distributed as • Is there something similar for 1-norm? • Yes: Cauchy distribution! • 1-stable: • is distributed as • What’s wrong then? • Cauchy are heavy-tailed… • doesn’t even have finite expectation

  25. Weak embedding [Indyk’00] • Still, can consider map as before • Each coordinate distributed as Cauchy • does not concentrate at all, but… • Can estimate by: • Median of absolute values of coordinates! • Concentrates because abs(Cauchy) is in the correct range for 90% of the time! • Gives a sketch • OK for nearest neighbor search

More Related