Algorithmic Frontiers of Doubling Metric Spaces

Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with YairBartal, Lee-Ad Gottlieb, Aryeh Kontorovich

The Traveling Salesman Problem: Low-dimensionality implies PTAS Robert Krauthgamer Weizmann Institute of Science Joint work with YairBartal and Lee-Ad Gottlieb TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA

Traveling Salesman Problem (TSP) • Definition: Given a set of cities (points), find a minimum-length tour that visits all points • Classic, well-studied NP-hard problem • [Karp‘72; Papadimitriou-Vempala‘06] • Mentioned in a handbook from 1832! • Common benchmark for optimization methods • Many books devoted to TSP… • Numerous variants • Closed/open tour • Multiple tours • Average visit time (repairman) • Etc… Optimal tour Algorithmic Frontiers of Doubling Metric Spaces

Metric TSP • Basic assumptions on distances • Symmetric • d(x,y) = d(y,x) • Metric • Triangle inequality: d(x,y) + d(y,z) ≤ d(x,z) • Easy 2-approximation via MST • Since OPT ≥ MST • Can do better… • MST+MatchingOPT [Christofides’76] MST Algorithmic Frontiers of Doubling Metric Spaces

Euclidean TSP • Sanjeev Arora [JACM‘98] and Joe Mitchell [SICOMP‘99]: Euclidean TSP with fixed dimension admits a PTAS • Find (1+Ɛ)-approximate tour • In time n∙(log n)Ɛ-Õ(dimension) where n = #points • (Extends to other norms) • They were awarded the 2010 Gödel Prize for this discovery Algorithmic Frontiers of Doubling Metric Spaces 5

PTAS Beyond Euclidean? • To achieve a PTAS, two properties were assumed • Euclidean space (at least approximately) • Fixed dimension • Are both these assumptions required? • Fixed dimension is necessary • No PTAS for (log n)-dimensions unless P=NP [Trevisan’00] • Is Euclidean necessary? • Consider metric spaces with low Euclideanintrinsic dimension… Algorithmic Frontiers of Doubling Metric Spaces 6

Doubling Dimension • Definition: Ball B(x,r) = all points within distance r from x. • The doubling constant (of a metric M) is the minimum value >0such that every ball can be covered by balls of half the radius • First used by [Assoud‘83], algorithmically by [Clarkson‘97]. • The doubling dimension is ddim(M)=log (M) [Gupta-K. -Lee‘03] • M is called doubling if its doubling dimension is constant • Packing property of doubling spaces • A set with diameter D>0 and inter-point distance ≥a, contains at most (D/a)O(ddim)points Here ≤7. Algorithmic Frontiers of Doubling Metric Spaces

Applications of Doubling Dimension • Nearest neighbor search • [K.-Lee’04; HarPeled-Mendel’06; Beygelzimer-Kakade-Langford’06; Cole-Gottlieb‘06] • Spanners, routing • [Talwar’04; Kleinberg-Slivkines-Wexler’04; Abraham-Gavoille-Goldberg-Malkhi’05; Konjevod-Richa-Xia-Yu’07, Gottlieb-Roditty’08; Elkin-Solomon‘12;] • Distance oracles • [HarPeled-Mendel’06;Bartal-Gottlieb-Roditty-Kopelowitz-Lewenstein’11] • Dimension reduction • [Bartal-Recht-Schulman’11, Gottlieb-K.’11] • Machine learning and statistics • [Bshouty-Yi-Long‘09; Gottlieb-Kontorovich-K.’10,‘12; ] G H 1 2 2 1 1 1 1 Algorithmic Frontiers of Doubling Metric Spaces 8

PTAS for Metric TSP? • Does TSP on doubling metrics admit a PTAS? • Aroraand Mitchell made strong use of Euclideanproperties • “Most fascinating problem left open in this area” [James Lee, tcsmath blog, June ’10] • Some attempts • Quasi-PTAS [Talwar‘04] (First description of problem) • Quasi-PTASfor TSP w/neighborhoods [Mitchell’07; Chan-Elbassioni‘11] • Subexponential-TAS, under weaker assumption [Chan-Gupta‘08] • Our result: TSP on doubling metrics admits a PTAS • Find (1+Ɛ)-approximate tour • In time: n2O(ddim)2Ɛ-Õ(ddim) 2O(ddim2) log½n • Euclidean(to compare): n∙(log n)Ɛ-Õ(dimension) Throughout, think of ddimand εas constants Algorithmic Frontiers of Doubling Metric Spaces 9

Metric Partition • A quadtree-like hierarchy [Bartal’96, Gupta-K.-Lee’03, Talwar‘04] • At level i: • Random radii Ri2 [2i, 2·2i] • Centers are 2i-apart in arbitraryorder Algorithmic Frontiers of Doubling Metric Spaces

Metric Partition (2) • A quadtree-like hierarchy [Bartal’96, Gupta-K.-Lee’03, Talwar‘04] • Recursively to level i-1: • Caveat: log(n)hiearchical levels suffice • Ignore tiny distances < 1/n2 • Random radii Ri-12 [2i-1, 2·2i-1] Algorithmic Frontiers of Doubling Metric Spaces

Dense Areas • Key observation: • The points (metric space) can be decomposed into sparse areas • Call a level i ball “dense” if • local tour weight (i.e. inside Ri-ball) is ≥ Ri/Ɛ • Such a ball can be removed, solving each sub-problem separately • Cost to join tours is relatively small: • only Ri Algorithmic Frontiers of Doubling Metric Spaces

Sparsification • Sparse decomposition: • Search hierarchy bottom-up for dense balls. • Remove dense ball: • Ball is composed of 2O(ddim) sparse sub-balls • So it’s barely dense, i.e. local tour weight ≤ 2O(ddim)Ri-1/Ɛ • Recurse on remaining point set • But how do we know the local weight of the tour in a ball? • Can be estimated using the local MST • Modulo caveats like “long” edges… • OPTɅ B(u,R)≤ O(MST(S)) • OPT Ʌ B(u,3R)≥ Ω(MST(S)) - Ɛ-O(ddim) R Henceforth, we assume the input is sparse Algorithmic Frontiers of Doubling Metric Spaces

Light Tours • Definition: A tour is (m,r)-light on a hierarchy if it enters all cells (clusters) • At most r times, and • Only via m designated portals • Choose portals as (2i/M)–net points • Then m = MO(ddim) 2i-1/M Algorithmic Frontiers of Doubling Metric Spaces

Optimizing over Light Tours • Theorem [Arora‘98,Talwar‘04]: Given a hierarchical partition, a minimum-length (m,r)-light tour for it can be computed exactly • In time mr∙O(ddim)n∙logn • Via dynamic programming • Join tours for small clusters into tour for larger cluster Typically both m,r ≈ polylog(n/ε), thus mr ≈ npolylogn Algorithmic Frontiers of Doubling Metric Spaces

Better Partitions and Lighter Tours • Our Theorem: For every (optimal) tour T, there is a partition with an (m,r)-light tour T’ such that • M = ddim∙logn/Ɛ • m = MO(ddim) = (log n/Ɛ)Õ(ddim) • r = ε-O(ddim) loglogn • And length(T’) ≤ (1+Ɛ)∙length(T) • If the partition were known, then a tour like T’ could be found in time • mr O(ddim)n∙log n = n 2Ɛ-Õ(ddim) loglog2n • It remains to prove the Theorem, and show how to find the partition Now mr≈ poly(n) after that a bit later Algorithmic Frontiers of Doubling Metric Spaces

Constructing Light Tours • Modify a tour T to be (m,r)-light [Arora‘98, Talwar‘04] • Part I: Focus on m (i.e. net points) • Move cut edges to be incident on net points • Expected cost at one level (for edge of unit length) • Radius Ri-12i-1 • Pr[cut edge] ≤ O(ddim/Ri-1) • Expected cost ≤ (Ri-1/M)(ddim/Ri-1) = ddim/M = Ɛ/log n • Expected cost to edge over alllevels: ≤ log n ∙ Ɛ/log n = Ɛ • We thus constructed a (1+Ɛ)-approximate tour 2i-1/M Algorithmic Frontiers of Doubling Metric Spaces

Constructing Light Tours (2) • Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04] • Part II: Focus on r (i.e. number of crossing edges) • Reduce number of crossings • Patching step: Reroute (almost all) crossings back into cluster • Cost ≈ length of tour on the patched endpoints ≈ MST of these points • MST Theorem [Talwar ‘04]: For a set S of points • MST(S) ≤ diam(S)∙|S|1-1/ddim • Cost per point ≤ diam(S) / |S|1/ddim diam(S) Algorithmic Frontiers of Doubling Metric Spaces

Constructing Light Tours (3) • Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04] • Part II: Focus on r (i.e. number of crossing edges) • Reduce number of crossings • Expected cost to edge at level i-1 • Radius Ri-1 ≈ 2i-1 • Pr[edge is patched ] ≤ Pr[edge is cut ] • Expected cost ≤ (Ri-1/r1/ddim)(ddim/Ri-1) = ddim/r1/ddim • As before, want this to be ≤ Ɛ/log n (because we sum over log n levels) • Could take r = (ddim∙logn /Ɛ)ddim • But dynamic program runs in time mr QPTAS! [Talwar ‘04] 2Ri-1 Challenge: smaller value for r Algorithmic Frontiers of Doubling Metric Spaces

Patching in Sparse Areas • Suppose a tour is q-sparse with respect to hierarchy • Every R-ball contains weight qR (for all R=2i) • Expectation: Random R-ball cuts weight Rq/R = q • Cluster formed by cuts from many levels • Expectation: weight q is cut per level • If r = q∙2loglog n • Expectation: level i-1 patching includes edges cut at muchhigher levels • Charge only “top” half of patched edges • Each charged about 2Ri-1 • Pr[edge is charged for patching] ≤ Pr[edge is cut at level i+loglogn] ≤ ddim/(Ri-1 log n) Ri-1/M Algorithmic Frontiers of Doubling Metric Spaces

Wrapping Up (Patching Sparse Areas) • Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04] • Part II: Focus on r (i.e. number of crossing edges) • Reduce number of crossings • Expected cost at level i-1 • Expected cost ≤ (Ri-1/r1/ddim)(ddim/Ri-1log n) = ddim/log n∙r1/ddim • As before, want this term to be equal to Ɛ/log n • Take r = (ddim/Ɛ)ddim • Obtain PTAS! 2Ri-1 Algorithmic Frontiers of Doubling Metric Spaces

Technical Subtleties • Outstanding problem: • Previous analysis assumed ball cuts only q edges • True in expectation… Not good enough • Solution: try many hierarchies • Choose at random log n radii for each ball and try all their combinations! • WHP, some hierarchy cuts q edges in every ball • Drives up runtime of dynamic program Ri-1/M Algorithmic Frontiers of Doubling Metric Spaces

Algorithmic Frontiers of Doubling Metrics Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb and Aryeh Kontorovich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA

Machine Learning in Doubling Metrics Large-margin classification in metric spaces [vonLuxburg-Bousquet’04] • Unknown distribution D of labeled points (x,y) 2 M£{-1,1} • M is a metric space (generalizes Rdim) • Labels are L-Lipschitz: |yi-yj| ≤ L∙d(xi,xj)(generalizes margin) • Resource: Sample of labeled points • Goal: Build hypothesis f:M {-1,1} that has (1-ε)-agreement with D • Statistical complexity: How many samples needed? • Computational complexity: Running time? • Extensions: • Small fraction of labels are wrong (adversarial noise) • Real-valued labels y2[-1,1] (metric regression) +1 2/L -1 2/L f Algorithmic Frontiers of Doubling Metric Spaces

Generalization Bounds • Our approach: Assume Mis doubling and use generalized VC-theory [Alon-BenDavid-CesaBianchi-Haussler’97, Bartlett-ShaweTaylor’99] • Example: Earthmover distance (EMD) in the plane between sets of size k has ddim ≤ O(k log k) • Standard algorithm: pick hypothesis that fits all/most observed samples • Theorem:Class of L-Lipschitz functions has fat-shattering dimension fsdim ≤ (c∙L∙diam(M))ddim. • Corollary: If f is L-Lipschitz and classifies n samples correctly, WHP PrD[sgn(f(x)) ≠ y] ≤ O(fsdim∙(log n)2/n). Similarly, if fcorrectly classifies all but η-fraction, then WHP PrD[sgn(f(x)) ≠ y] ≤ η + O(fsdim∙(log n)2/n)1/2. • Bounds incomparable to [vonLuxburg-Bousquet’04] Algorithmic Frontiers of Doubling Metric Spaces

Algorithmic Aspects (noise-free) • Computing a hypothesis f from the samples (xi,yi): • Where S+and S- are the positively and negatively labeled samples • Lemma (Lipschitz extension): If labels are L-Lipschitz, so is f. • Evaluatingf(x) requires solving Nearest Neighbor Search • Explains a common classification heuristic, e.g. [Cover-Hart’67] • But might require Ω(n) time… • We show how to use (1+ε)-Nearest Neighbor Search • This can be solved quickly in doubling metrics • We prove similar generalization bound by sandwiching sgn(f(x)) +1 ? -1 f Algorithmic Frontiers of Doubling Metric Spaces

Extensions (noisy case) 1. A small fraction of labels are wrong (adversarial noise) • How to compute a hypothesis? • Build a bipartite graph (on S+[S-) of all violations to Lipschitz condition (edge between two points at distance < 2/L). • Compute a minimum vertex cover (or faster: 2-approximation) 2. Real-valued labelsy2[-1,1] (metric regression) • Minimize risk (expected loss) Ex,y|f(x)-y| • Extend the statistical framework by similar ideas • But how to compute a hypothesis? • Write LP: minimize Σi |f(xi)-yi| subject to |f(xi)-f(xj)| ≤ L∙d(xi,xj)8i,j • Reduce #constraints from O(n2) to O(ε-ddimn) using (1+ε)-spanner on xi’s • Apply fast approximate LP solver Algorithmic Frontiers of Doubling Metric Spaces

Conclusion • General paradigm: low-dim. Euclidean spaces$doubling metric spaces • Mathematically– latter is different (strictly bigger) family • Not even low-distortion embeddings [Laakso’00,’01] • For algorithmic efficiency – strong analogy/similarity • E.g., nearest neighbor search, distributed computing and networking, combinatorial optimization, machine learning • Research directions: • Other computational tasks or application areas? • Particularly in machine learning, data structures • Scenarios where analogy fails? • E.g. [Indyk-Naor’05] which uses random projections • Other metric models? E.g. hyperbolic … Algorithmic Frontiers of Doubling Metric Spaces

Algorithmic Frontiers of Doubling Metric Spaces

Algorithmic Frontiers of Doubling Metric Spaces

Presentation Transcript

Algorithmic Aspects of Finite Metric Spaces

Doubling Sides

Frontiers of Microeconomics

The Intrinsic Dimension of Metric Spaces

Frontiers of Genetics

Frontiers of Microeconomics

Frontiers of Biotechnology

Scalable and Distributed Similarity Search in Metric Spaces

Proximity algorithms for nearly-doubling spaces

Tighter local versus global properties of metric spaces

Hidden Metric Spaces and Navigability of Complex Networks

Local and Global Embeddings of Metric Spaces

Frontiers of Genetics

NM-Tree : Flexible Approximate Similarity Search in Metric and Non-metric Spaces

Embedding Metric Spaces in Their Intrinsic Dimension

An optimal dynamic spanner for points residing in doubling metric spaces

doubling of a square

Compact Metric Spaces as Minimal Subspaces of Domains of Bottomed Sequences

Searching Dynamic Point Sets in Spaces with Bounded Doubling Dimension

Frontiers of Biotechnology

Frontiers of Biotechnology

Frontiers of Microeconomics