310 likes | 441 Views
Optimal lower bounds for Locality Sensitive Hashing. (except when q is tiny). Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU). Locality Sensitive Hashing [Indyk-Motwani '98]. h :. objects. sketches. H : family of hash functions h s.t.
E N D
Optimal lower bounds for Locality Sensitive Hashing (except when q is tiny) Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)
Locality Sensitive Hashing [Indyk-Motwani '98] h : objects sketches H: familyof hash functions h s.t. “similar” objects collide w/ high prob. “dissimilar” objects collide w/ low prob.
1 0 1 1 1 1 0 1 0 0 0 0 1 1 0 0 0 1 Min-wise hash functions [Broder '98] word 1? word 2? word 3? word d? A B Jaccard similarity: Invented simple Hs.t. Pr[h(A) = h(B)] =
1000+ cites Indyk-Motwani '98 Defined LSH. Invented very simple H good for {0, 1}d under Hamming distance. Showed good LSH implies good nearest-neighbor-search data structs.
Google Patented by . Charikar '02, STOC Proposed alternate H (“simhash”) for Jaccard similarity.
Practice Theory [Broder ’97] Free code base [AI’04] [Indyk–Motwani ’98] Sequence comparisonin bioinformatics [Gionis–Indyk–Motwani ’98] [Charikar ’02] Association-rule findingin data mining [Datar–Immorlica– –Indyk–Mirrokni ’04] [Motwani–Naor–Panigrahi ’06] Collaborative filtering [Andoni–Indyk ’06] Clustering nouns bymeaning in NLP [Tenesawa–Tanaka ’07] [Andoni–Indyk ’08, CACM] Pose estimation in vision [Neylon ’10] • • •
≤ r ≥ cr Given: (X, dist), r > 0, c > 1 distance space “radius” “approx factor” Goal: Family Hof functions X → S (Scan be any finite set) s.t. ∀ x, y ∈ X, ≥ p ≥ q.25 ≥ q.1 ≥ qρ ≥ q.5 ≤ q
Theorem [IM’98, GIM’98] Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search”for n points with data structure of size: O(n1+ρ) query time: Õ(nρ) hash fcn evals.
1 0 1 1 1 1 0 1 0 0 0 0 1 1 0 0 0 1 Example X = {0,1}d, dist = Hamming r = εd, c = 5 dist ≤εd or ≥5εd [IM’98] H = { h1, h2, …, hd }, hi(x) = xi “output a random coord.”
Analysis = qρ = q (1 − 5ε)1/5 ≈ 1 − ε. ∴ ρ ≈ 1/5 (1 − 5ε)1/5 ≤ 1 − ε. ∴ ρ≤ 1/5 In general, achieves ρ ≤ 1/c, ∀c (∀r).
= ≤ r ≥ cr ρ= 0 positive Optimal upper bound ( {0, 1}d, Ham ), r > 0, c > 1. S≝ {0, 1}d ∪ {✔}, H ≝ {hab : dist(a,b) ≤ r} ✔ if x = a or x = b hab(x) = x otherwise >0.5 >0.1 >0.01 > 0.0001 = 0
Wait, what? Theorem [IM’98, GIM’98] Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search”for n points with data structure of size: O(n1+ρ) query time: Õ(nρ) hash fcn evals.
Wait, what? Theorem [IM’98, GIM’98] Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search”for n points with data structure of size: O(n1+ρ) query time: Õ(nρ) hash fcn evals. q ≥ n-o(1) ("not tiny") assuming
More results For Rd with ℓp-distance: when p = 1, 0 < p < 1, p = 2 [IM’98] [DIIM’04] [AI’06] For Jaccard similarity: ρ ≤ 1/c [Bro’98] For {0,1}d with Hamming distance: [MNP’06] −od(1) (assuming q ≥ 2−o(d)) immediately for ℓp-distance
Our Theorem For {0,1}d with Hamming distance: (∃ r s.t.) −od(1) (assuming q ≥ 2−o(d)) immediately for ℓp-distance Proof also yields ρ ≥ 1/c for Jaccard.
Proof : Noise-stability is log-convex.
Proof : A definition, and two lemmas.
0 0 0 1 1 1 1 1 0 0 0 0 1 1 0 1 0 0 Definition: Noise stability at e-т Fix any arbitrary function h : {0,1}d → S. Pick x ∈ {0,1}d at random: x = h(x) = s Flip each bit w.p. (1-e-2т)/2 independenttly h(y) = s’ y = def:
τ Lemma 1: τ For x y, 0 logKh(τ) dist(x, y) = o(d) w.v.h.p. ≈ when τ ≪ 1. Proof: Chernoff bound and Taylor expansion. Lemma 2: Kh(τ) is a log-convex function of τ. (for any h) Proof uses Fourier analysis of Boolean functions.
Fourier transformation • Theorem. f : {0, 1}d -> R can be uniquely written as where • Proof. is an orthonormal basis of {f : {0, 1}d -> R}. Fourier coef. Basis fcns.
Lemma 2: Kh(τ) is a log-convex function of τ. Let hi(x) = 1h(x)=i . Proof:
Lemma 2: Kh(τ) is a log-convex function of τ. Let hi(x) = 1h(x)=i . Proof: non-neg comb. of log-convex fcns.
τ Lemma 1: τ For x y, 0 logKh(τ) dist(x, y) = o(d) w.v.h.p. ≈ when τ ≪ 1. Lemma 2: Kh(τ) is a log-convex function of τ. (for any h) Theorem: LSH for {0,1}d requires .
in truth, q+2−Θ(d); we assume q not tiny Proof: Say H is an LSH family for {0,1}d with params (εd + o(d), cεd - o(d), qρ, q) . (c − o(1)) r r def: (Non-neg. lin. comb. of log-convex fcns. ∴ KH(τ) is also log-convex.) ∴ KH(ε) ≳ qρ w.v.h.p., dist(x,y) ≈ (1 - e-т)d ≈ тd KH(cε) ≲ q
1 ρln q≤ ln q c 1 ln ∴KH(0) = 0 ln qρ ∴ KH(ε) ≳ ρ ln q KH(τ) is log-convex q ln ln q KH(cε) ≲ ε cε τ 0 1 ln q c ln q ln KH(τ) ∴