250 likes | 355 Views
Navigating Nets: Simple algorithms for proximity search. Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley). A classical problem. Fix a metric space (X,d) : X = set of points. d = distance function over X . Near-neighbor search (NNS) [Minsky-Papert]:
E N D
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
A classical problem Fix a metric space (X,d): • X = set of points. • d = distance function over X. Near-neighbor search (NNS) [Minsky-Papert]: • Preprocess a given n-point subset S X. • Given a query point q 2 X, quickly compute the closest point to q among S. Navigating Nets
Variations on NNS • (1+e)-approximate nearest neighbor search: • Find a2X such that d(q,a) · (1+) d(q,S). • Dynamic case: • Allow updates to S (insertions and deletions). • Distributed case: • No central index (e.g., nodes in a network). • Other cost measures (e.g., communication, stretch, load). Navigating Nets
n n-1 n General metrics • Only oracle access to distance function d(¢,¢). • Models a complicated metric or on-demand measurement. • No “hashing of coordinates” or tuning for a specific metric. • Goal: efficient query (sublinear or polylog time). • Impossible, even if the data set S is a path metric: 1 2 n What about approximate NNS? Navigating Nets
1 1 1 Approximate NNS • Hard even for (near) uniform metrics • d(x,y) = 1 for all x,y2S. But many data sets lack large uniform subsets. Can we quantify this? Navigating Nets
Abstract dimension • The doubling constantlX of a metric (X,d) is the minimum l such that every ball can be covered by l balls of half the radius. • The metric is doubling if lX = O(1). • The (abstract) dimension is dim(X) = log2lX. • Immediate properties: • dimA(Rd , || · ||2) = O(d). • dimA(X’) dimA(X) for all X’ X. • dimA(X) log |X|. (Equality for a uniform metric.) Navigating Nets
Illustration • Grid with missing piece Navigating Nets
Illustration • Grid with missing piece • Low-dimensional manifold (bounded curvature) Navigating Nets
Illustration • Grid with missing piece • Manifold • Union of curves in Euclidean space Navigating Nets
Embedding doubling metrics • Theorem [Assouad, 1983] [Gupta, K., Lee, 2003]: Fix 0<e<1, and let (X,d) be a doubling metric. Then (X,de) can be embedded with O(1) distortion into l2O(1). • Not true for =1 [Semmes, 1996]. • Motivation: Embed S and then apply Euclidean NNS. Navigating Nets
Our results • Simple data structure for maintaining S: • (1+e)-NNS query time: (1/e)O(dim(S)) · log D (for e<½), where D=dmax/dmin is the normalized diameter of S (typically D=nO(1)). • Space: n · 2O(dim(S)). • Dynamic maintenance of S: • Insertion / deletion time: 2O(dim(S)) · log D · loglog D. • Additional properties: • Best possible dependency on dim(S) (in a certain model). • Oblivious to dim(S) and robust against “bad localities”. • Matches/improves known (more specialized) results. Navigating Nets
Running example – a path metric: • A 16-net • An 8-net • A 4-net Nets • Definition: An r-net of X is a subset Y with 1. d(y1,y2) r for all y1,y22 Y. 2. d(x,Y) < r for all x 2 XnY. (I.e., a maximal r-separated subset.) • Note: Compare vs. -net. Navigating Nets
Y r Y Y Y More nets • Definition: An r-net of X is a subset Y with 1. d(y1,y2) r for all y1,y22 Y. 2. d(x,Y) < r for all x 2 XnY. (I.e., a maximal r-separated subset.) • Note: Compare vs. -net. Navigating Nets
A 16-net An 8-net A 4-net The data structure • For every r = 2i, let Yr be an r-net of S. • Only O(log D) values of r are non-trivial. • For every y 2 Yr maintain a navigation list Ly,r = {z 2 Yr/2: d(y,z) 2r} Navigating Nets
3r Yr More on the data structure • For every r = 2i, let Yr be an r-net of S. • Only O(log D) values of r are non-trivial. Yr/2 • For every y 2 Yr maintain a navigation list Ly,r = {z 2 Yr/2: d(y,z) 2r} Navigating Nets
Space requirement Lemma: |Ly,r| 2O(dim(S)) for all y2Y, r¸0. Proof: • Ly,ris contained in a ball of radius 2r. • This ball can be covered by lS3 balls of radius r/4. • Every point in Ly,r Yr/2must be covered by a distinct ball. • Hence, | Ly,r | lS3 = 23dim(S). Corollary: Total space is 2O(dim(S)) · n · log D. • We actually improve it to 2O(dim(S)) · n. Navigating Nets
A 16-net • An 8-net • A 4-net Back to running example Navigating Nets
$ Initiallyz16 = only point in Y16. Findz8 = closest Y8 point to $. $ Findz4 = closest Y4 point to $ etc. $ Navigating nets • Let $denote the query point. Navigating Nets
How to find zr/2? • Assume each zr2Yr is the closest point to a (instead of to q). • Then d(zr,zr/2) · r+r/2 = 3r/2. • And zr/2 must be in zr‘s list Ly,r. • zr · r • a · r/2 • q • For zr to be closest Yr point to q, • It suffices that d(q,a) · r/4. • And then zr’s list Ly,r contains zr/2. • Note:d(q,zr) · 3r/2. · r/4 • zr/2 Navigating Nets
Stopping point • If we find a point zr with d(q,zr) · 3r/2, • But not a point zr/2 with d(q,zr/2) · 3r/4, • We know that d(q,S) > r/4, • Yielding 6-NNS with query time 2O(dim(S)) · log D. • This can be extended to (1+)-NNS • Similar principles yield insertions and deletions. Navigating Nets
Near-optimality • The basic idea: • Consider a uniform metric on l points. • Let the query point be at distance 1 from all of them, • Except for one point whose distance is 1-e. • Finding this point requires (in an oracle model) computing all l distances to q. • Can happen at every distance scale r. • We get a lower bound of 2W (dim(S))log D. Navigating Nets
Related work – general metrics • Let KX be the smallest K such that |B(x,r)| K ¢ |B(x,r/2)| for all x 2 X, r ¸ 0. • Define the KR-dimension as log2 KX. • Randomized exact NNS [Karger-Ruhl’02, Hildrum et al.’04]: • Space n · 2O(dim(S)) · log D. • Query time : 2O(dim(S)) · log D. • If dimKR(S) = O(1) the log D term is actually O(log n). • Our results extend to this setting: 1. KR-metrics are doubling: dim(X) 4dimKR(X). 2. Our algorithms actually give exact NNS. • Assumptions on query distribution [Clarkson’99]. Navigating Nets
Related work – Euclidean metrics • Exact NNS for Rd: • O(d5 log n) query time and O(nd+d) space. [Meiser’93] • (1+e)-NNS for Rd: • O((d/e)d log n) query time and O(dn) space by quad-tree like decompositions [AMNSW’94]. • Our algorithm achieves similar bounds. • O(d polylog(dn)) query time and (dn)O(1) space is useful for higher dimensions [IM’98, KOR’98]. Navigating Nets
Concluding remarks • Our approach: • A “decision tree” that is not really a tree (saves space). • In progress: • A different (static) scheme where log is replaced by log n. • Bounds on the help of “ambient” space points. • Our data structure yields a spanner of the metric • Immediate: O(1) stretch with average degree 2dim(S). • More work: O(1) stretch with maximum degree 2dim(S). • [Guibas,’04] applied the nets data structure for moving points in the plane. Navigating Nets
Thank you! Navigating Nets