1 / 46

CS 361A (Advanced Data Structures and Algorithms)

CS 361A (Advanced Data Structures and Algorithms). Lecture 19 (Dec 5, 2005) Nearest Neighbors: Dimensionality Reduction and Locality-Sensitive Hashing Rajeev Motwani. Metric Space. Metric Space (M,D) For points p,q in M , D(p,q) is distance from p to q

Download Presentation

CS 361A (Advanced Data Structures and Algorithms)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 361A (Advanced Data Structures and Algorithms) Lecture 19 (Dec 5, 2005) Nearest Neighbors: Dimensionality Reduction and Locality-Sensitive Hashing Rajeev Motwani

  2. Metric Space • Metric Space (M,D) • For points p,q in M, D(p,q) is distance from p to q • only reasonable model for high-dimensional geometric space • Defining Properties • Reflexive:D(p,q) = 0 if and only if p=q • Symmetric:D(p,q) = D(q,p) • Triangle Inequality:D(p,q) is at most D(p,r)+D(r,q) • Interesting Cases • M  points in d-dimensional space • D  Hamming or Euclidean Lp-norms

  3. High-Dimensional Near Neighbors • Nearest Neighbors Data Structure • Given – N points P={p1, …, pN} in metric space (M,D) • Queries –“Which point pP is closest to point q?” • Complexity – Tradeoff preprocessing space with query time • Applications • vector quantization • multimedia databases • data mining • machine learning • …

  4. Known Results • Some expressions are approximate • Bottom-line – exponential dependence on d

  5. Approximate Nearest Neighbor • Exact Algorithms • Benchmark – brute-force needs space O(N), query time O(N) • Known Results – exponential dependence on dimension • Theory/Practice – no better than brute-force search • Approximate Near-Neighbors • Given – N points P={p1, …, pN} in metric space (M,D) • Given–error parameter >0 • Goal –for query q and nearest-neighbor p, return r such that • Justification • Mapping objects to metric space is heuristic anyway • Get tremendous performance improvement

  6. Results for Approximate NN • Will show main ideas of last 3 results • Some expressions are approximate

  7. Approximate r-Near Neighbors • Given – N points P={p1,…,pN} in metric space (M,D) • Given–error parameter >0, distance threshold r>0 • Query • If no point p with D(q,p)<r, return FAILURE • Else, return anyp’ with D(q,p’)< (1+)r • Application • Solving Approximate Nearest Neighbor • Assume maximum distance is R • Run in parallel for • Time/space – O(log R) overhead • [Indyk-Motwani] – reduce to O(polylog n) overhead

  8. Hamming Metric • Hamming Space • Points in M: bit-vectors {0,1}d (can generalize to {0,1,2,…,q}d) • Hamming Distance:D(p,q) = # of positions where p,q differ • Remarks • Simplest high-dimensional setting • Still useful in practice • In theory, as hard (or easy) as Euclidean space • Trivial in low dimensions • Example • Hypercube in d=3 dimensions • {000, 001, 010, 011, 100, 101, 110, 111}

  9. Dimensionality Reduction • Overall Idea • Map from high to low dimensions • Preserve distances approximately • Solve Nearest Neighbors in new space • Performance improvement at cost of approximation error • Mapping? • Hash function family H = {H1, …, Hm} • Each Hi: {0,1}d  {0,1}t with t<<d • Pick HR from Huniformly at random • Map each point in P using same HR • Solve NN problem on HR(P) = {HR(p1), …, HR(pN)}

  10. Reduction for Hamming Spaces Theorem: For any r and small >0, there is hash family H such that for any p,q and random HRH with probability >1-, provided for some constant C, b b a a c c

  11. Remarks • For fixed threshold r, can distinguish between • NearD(p,q) < r • Far D(p,q) > (1+ε)r • For N points, need • Yet, can reduce to O(log N)-dimensional space, while approximately preserving distances • Works even if points not known in advance

  12. Hash Family • Projection Function • Let S be ordered, multiset of s indexes from {1,…,d} • p|S:{0,1}d {0,1}s projects p into s-dimensional subspace • Example • d=5, p=01100 • s=3, S={2,2,4}  p|S = 110 • Choosing hash function HR in H • Repeat for i=1,…,t • Pick Si randomly (with replacement) from {1…d} • Pick random hash function fi:{0,1}s{0,1} • hi(p)=fi(p|Si) • HR(p) = (h1(p), h2(p),…,ht(p)) • Remark –note similarity to Bloom Filters

  13. Illustration of Hashing . . . . . 1 d p p|S1 p|St . . . . . . . . . . . 1 s 1 s ft f1 HR(p) . . . h1(p) ht(p)

  14. Analysis I • Choose random index-set S • Claim:For any p,q • Why? • p,q differ in D(p,q) bit positions • Need all s indexes of S to avoid these positions • Sampling with replacement from {1, …,d}

  15. Analysis II • Chooses=d/r • Since 1-x<e-x for |x|<1, we obtain • Thus

  16. Analysis III • Recallhi(p)=fi(p|Si) • Thus • Choosing c= ½ (1-e-1)

  17. Analysis IV • RecallHR(p)=(h1(p),h2(p),…,ht(p)) • D(HR(p),HR(q)) = number of i’s wherehi(p), hi(q) differ • By linearity of expectations • Theorem almost proved • For high probability bound, need Chernoff Bound

  18. Chernoff Bound • Consider Bernoulli random variables X1,X2, …, Xn • Values are 0-1 • Pr[Xi=1] = x and Pr[Xi=0] = 1-x • DefineX = X1+X2+…+Xn with E[X]=nx • Theorem: For independentX1,…, Xn, for any 0<<1, P 2nx X nx

  19. Analysis V • Define • Xi=0 if hi(p)=hi(q), and 1 otherwise • n=t • Then X = X1+X2+…+Xt= D(HR(p),HR(q)) • Case 1 [D(p,q)<r  x=c] • Case 2 [D(p,q)>(1+ε)r  x=c+ε/6] • Observe – sloppy bounding of constants in Case 2

  20. Putting it all together • Recall • Thus, error probability • Choosing C=1200/c • Theorem is proved!!

  21. Algorithm I • Set error probability • Select hash HR andmappoints p  HR(p) • Processing query q • ComputeHR(q) • Find nearest neighbor HR(p) for HR(q) • Ifthen return p, else FAILURE • Remarks • Brute-force for findingHR(p) implies query time • Need another approach for lower dimensions

  22. Algorithm II • Fact – Exact nearest neighbors in {0,1}t requires • SpaceO(2t) • Query timeO(t) • How? • Precompute/store answers to all queries • Number of possible queries is 2t • Since • Theorem – In Hamming space {0,1}d, can solve approximate nearest neighbor with: • Space • Query time

  23. Different Metric • Many applications have “sparse” points • Many dimensions but few 1’s • Example – pointsdocuments, dimensionswords • Better to view as “sets” • Previous approach would require larges • For sets A,B, define • Observe • A=B  sim(A,B)=1 • A,B disjoint  sim(A,B)=0 • Question – Handling D(A,B)=1-sim(A,B) ?

  24. Min-Hash • Random permutations p1,…,pt of universe (dimensions) • Define mapping hj(A)=mina in A pj(a) • Fact:Pr[hj(A)= hj(B)] = sim(A,B) • Proof? – already seen!! • Overall hash-function HR(A) = (h1(A), h2(A),…,ht(A))

  25. Min-Hash Analysis • Select • Hamming Distance • D(HR(A),HR(B)) number of j’s such that • Theorem For any A,B, • Proof? – Exercise (apply Chernoff Bound) • Obtain – ANN algorithm similar to earlier result

  26. Generalization • Goal • abstract technique used for Hamming space • enable application to other metric spaces • handle Dynamic ANN • Dynamic Approximate r-Near Neighbors • Fix – threshold r • Query – if any point within distance r of q, return any point within distance • Allow insertions/deletions of points in P • Recall– earlier method required preprocessing all possible queries in hash-range-space…

  27. Locality-Sensitive Hashing • Fix – metric space (M,D), threshold r, error • Choose – probability parameters Q1 > Q2>0 Definition– Hash family H={h:MS} for (M,D) is called . -sensitive, if for random h and for any p,q in M • Intuition • p,q are near  likely to collide • p,q are far  unlikely to collide

  28. Examples • Hamming SpaceM={0,1}d • point p=b1…bd • H = {hi(b1…bd)=bi, for i=1…d} • sampling one bit at random • Pr[hi(q)=hi(p)] = 1 – D(p,q)/d • Set SimilarityD(A,B) = 1 – sim(A,B) • Recall • H = • Pr[h(A)=h(B)] = 1 – D(A,B)

  29. Multi-Index Hashing • Overall Idea • Fix LSH family H • BoostQ1, Q2gap by definingG = Hk • Using G, each point hashes intolbuckets • Intuition • r-near neighbors likely to collide • few non-near pairs in any bucket • Define • G = { g | g(p) = h1(p)h2(p)…hk(p) } • Hamming metric sample k random bits

  30. Example (l=4) …… h1 hk p g1 q g2 g3 g4 r

  31. Overall Scheme • Preprocessing • Prepare hash table for range of G • Select l hash functionsg1, g2, …, gl • Insert(p) – add p to bucketsg1(p), g2(p), …, gl(p) • Delete(p) – remove p from bucketsg1(p), g2(p), …, gl(p) • Query(q) • Check buckets g1(q), g2(q), …, gl(q) • Report nearest of (say) first 3l points • Complexity • Assume – computing D(p,q) needs O(d) time • Assume – storing p needs O(d) space • Insert/Delete/Query Time – O(dlk) • Preprocessing/Storage – O(dN+Nlk)

  32. Collision Probability vs. Distance 1 Q1 Q2 0 r r r r

  33. Multi-Index versus Error • Setl=Nz where Theorem For l=Nz, any query returns r-near neighbor correctly with probability at least 1/6. • Consequently (ignoring k=O(log N) factors) • Time O(dNz) • Space O(N1+z) • Hamming Metric • Boost Probability – use several parallel hash-tables

  34. Analysis • Define (for fixed query q) • p* – any point with D(q,p*) < r • FAR(q) – all p with D(q,p) > (1+ )r • BUCKET(q,j)– all p with gj(p) = gj(q) • Event Esize: (query cost bounded by O(dl)) • Event ENN:gj(p*) = gj(q) for some j (nearest point in l buckets is r-near neighbor) • Analysis • Show:Pr[Esize] = x > 2/3and Pr[ENN] = y > 1/2 • Thus: Pr[not(Esize & ENN)] < (1-x) + (1-y) < 5/6

  35. Analysis – Bad Collisions • Choose • Fact • Clearly • Markov Inequality – Pr[X>r.E[X]]<1/r, for X>0 • Lemma 1

  36. Analysis – Good Collisions • Observe • Since l=nz • Lemma 2Pr[ENN] >1/2

  37. Euclidean Norms • Recall • x=(x1,x2, …,xd) and y=(y1,y2, …,yd) in Rd • L1-norm • Lp-norm (for p>1)

  38. Extension to L1-Norm • Round coordinates to{1,…M} • Embed L1-{1,…,M}dinto Hamming-{0,1}dM • Unary Mapping • Apply algorithm for Hamming Spaces • Error due to rounding of 1/M  • Space-Time Overhead due to mapping of d  dM

  39. Extension to L2-Norm • Observe • Little difference in L1-norm andL2-norm for highd • Additional erroris small • More generally – Lp, for1 p 2 • [Figiel et al 1977, Johnson-Schechtman 1982] • Can embed LpintoL1 • Dimensions d  O(d) • Distances preserved within factor (1+a) • Key Idea– random rotation of space

  40. Improved Bounds • [Indyk-Motwani 1998] • For any Lp-norm • Query Time – O(log3 N) • Space – • Problem – impractical • Today – only a high-level sketch

  41. Better Reduction • Recall • Reduced Approximate Nearest Neighbors to Approximate r-Near Neighbors • Space/Time Overhead – O(log R) • R = max distance in metric space • Ring-Cover Trees • Removed dependence on R • Reduced overhead to O(polylog N)

  42. Approximate r-Near Neighbors • Idea • Impose regular-grid on Rd • Decompose into cubes of side length s • Label cubes with points at distance <r • Data Structure • Query q– determine cube containing q • Cube labels– candidate r-near neighbors • Goals • Small s lower error • Fewer cubes  smaller storage

  43. p1 p2 p3

  44. Grid Analysis • Assume r=1 • Choose • Cube Diameter = • Number of cubes = Theorem – For any Lp-norm, can solve Approx r-Near Neighbor using • Space – • Time – O(d)

  45. Dimensionality Reduction [Johnson-Lindenstraus 84, Frankl-Maehara 88] For , can map points in P into subspace of dimension while preserving all inter-point distances to within a factor • Proof idea – project onto random lines • Result for NN • Space – • Time – O(polylog N)

  46. References • Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality P. Indyk and R. Motwani STOC 1998 • Similarity Search in High Dimensions via Hashing A. Gionis, P. Indyk, and R. Motwani VLDB 1999

More Related