1 / 10

Near(est) Neighbor in High Dimensions

Alexandr Andoni (slides by Piotr Indyk). Near(est) Neighbor in High Dimensions. Nearest Neighbor. Given: A set P of points in R d Goal: build data structure which, for any query q, returns a point p  P minimizing ||p-q||. q. Solution for d=2 (sketch). Compute Voronoi diagram

hilda
Download Presentation

Near(est) Neighbor in High Dimensions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alexandr Andoni (slides by Piotr Indyk) Near(est) Neighbor in High Dimensions

  2. Nearest Neighbor • Given: • A set P of points in Rd • Goal: build data structure which, for any query q, returns a point pP minimizing ||p-q|| q

  3. Solution for d=2 (sketch) • Compute Voronoi diagram • Given q, perform point location • Performance: • Space: O(n) • Query time: O(log n) (see 6.838 for details)

  4. NN in Rd • Exact algorithms use • Either nO(d) space, • Or O(dn) time • Approximate algorithms: • Space/time exponential in d[Arya-Mount-et al], [Kleinberg’97], [Har-Peled’02] • Space/time polynomial in d[Kushilevitz-Ostrovsky-Rabani’98], [Indyk-Motwani’98], [Indyk’98],…

  5. Eigenfaces 0.01 - 0.2 + 0.03 +……….. = = 0.02 + 0.03 +0.01 +……….. ?

  6. (Approximate) Near Neighbor • Near neighbor: • Given: • A set P of points in Rd, r>0 • Goal: build data structure which, for any query q, returns a point pP, ||p-q|| ≤ r (if it exists) • c-Approximate Near Neighbor: • Goal: build data structure which, for any query q: • If there is a point pP, ||p-q|| ≤ r • it returns p’P, ||p-q|| ≤ cr r q cr

  7. Locality-Sensitive Hashing [Indyk-Motwani’98] q • Idea: construct hash functions g: Rd U such that for any points p,q: • If ||p-q|| ≤ r, then Pr[g(p)=g(q)] is “high” • If ||p-q|| >cr, then Pr[g(p)=g(q)] is “small” • Then we can solve the problem by hashing p • “not-so-small”

  8. LSH • A family H of functions h: Rd → U is called (P1,P2,r,cr)-sensitive, if for any p,q: • if ||p-q|| <r then Pr[ h(p)=h(q) ] > P1 • if ||p-q|| >cr then Pr[ h(p)=h(q) ] < P2 • We hash using g(p)=h1(p).h2(p)…hk(p) • Intuition: amplify the probability gap • We can solve a c-approximate NN with: • Number of hash functions: n,  =log1/P2(1/P1) • Query time: O(d nlog n) • Space: O(n+1 +dn)

  9. LSH for Hamming metric [IM’98] • Functions: h(p)=pi, i.e., the i-th bit of p • We have Pr[ h(p)=h(q) ] = 1-D(p,q)/d • This gives exponent =1/c • Used for genetic sequence retrieval, motif finding, video sequence retrieval, compression, web clustering, pose estimation, etc.

  10. Other norms • Can embed l1d with coordinates in {1…M} into dM-dimensional Hamming space

More Related