90 likes | 187 Views
Lecture 10: Sketching S3: Nearest Neighbor Search. Plan. PS2 due yesterday, 7pm Sketching Nearest Neighbor Search Scriber?. Sketching. short bit-strings given and , should be able to estimate some function of and With success probability ( ) , norm: words
E N D
Plan • PS2 due yesterday, 7pm • Sketching • Nearest Neighbor Search • Scriber?
Sketching • short bit-strings • given and ,should be able to estimate some function of and • With success probability () • , norm: words • Decision version: given in advance… • Lemma: , norm: bits 010110 010101 Estimate Distinguish between
Sketching: decision version Consider Hamming space: Lemma: for any , can achieve -bit sketch. [on blackboard]
Conclusion • Dimension reduction: • [Johnson-Lindenstrauss’84]: • a random linear projection into dimensions • preserves , up to approximation • with probability • Random linear projection: • Can be where is Gaussian, or entry-wise • Hence: preserves distance between points as long as • Can do faster than time • Using Fast Fourier Transform • In : no dimension reduction • But can do sketching • Using -stable distributions (Cauchy for ) • Sketching: decision version, constant : • For , can do with bits!
Section 3: Nearest Neighbor Search
Approximate NNS -approximate -near neighbor: given a query point • assuming there is a point within distance , • report a point s.t.
NNS: approach 1 • Boosted sketch: • Let = sketch for the decision version (90% success probability) • new sketch : • keep copies of • estimator is the majority answer of the estimators • Sketch size: bits • Success probability: (Chernoff) • Preprocess: compute sketches for all the points • Query: compute sketch , and compute distance to all points using sketch • Time: improved from to
NNS: approach 2 • Query time below ? • Theorem [KOR98]: query time and space for approximation. • Proof: • Note that has bits • Only possible sketches! • Store an answer for each of possible inputs • In general: • if a distance has constant-size sketch, admits a poly-space NNS data structure! • Space closer to linear? • approach 3 will require more specialized sketches…