280 likes | 393 Views
Algorithmic Frontiers of Doubling Metric Spaces. Robert Krauthgamer Weizmann Institute of Science Based on joint works with Yair Bartal , Lee-Ad Gottlieb, Aryeh Kontorovich. The Traveling Salesman Problem: Low-dimensionality implies PTAS. Robert Krauthgamer
E N D
Algorithmic Frontiers of Doubling Metric Spaces Robert Krauthgamer Weizmann Institute of Science Based on joint works with YairBartal, Lee-Ad Gottlieb, Aryeh Kontorovich
The Traveling Salesman Problem: Low-dimensionality implies PTAS Robert Krauthgamer Weizmann Institute of Science Joint work with YairBartal and Lee-Ad Gottlieb TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA
Traveling Salesman Problem (TSP) • Definition: Given a set of cities (points), find a minimum-length tour that visits all points • Classic, well-studied NP-hard problem • [Karp‘72; Papadimitriou-Vempala‘06] • Mentioned in a handbook from 1832! • Common benchmark for optimization methods • Many books devoted to TSP… • Numerous variants • Closed/open tour • Multiple tours • Average visit time (repairman) • Etc… Optimal tour Algorithmic Frontiers of Doubling Metric Spaces
Metric TSP • Basic assumptions on distances • Symmetric • d(x,y) = d(y,x) • Metric • Triangle inequality: d(x,y) + d(y,z) ≤ d(x,z) • Easy 2-approximation via MST • Since OPT ≥ MST • Can do better… • MST+MatchingOPT [Christofides’76] MST Algorithmic Frontiers of Doubling Metric Spaces
Euclidean TSP • Sanjeev Arora [JACM‘98] and Joe Mitchell [SICOMP‘99]: Euclidean TSP with fixed dimension admits a PTAS • Find (1+Ɛ)-approximate tour • In time n∙(log n)Ɛ-Õ(dimension) where n = #points • (Extends to other norms) • They were awarded the 2010 Gödel Prize for this discovery Algorithmic Frontiers of Doubling Metric Spaces 5
PTAS Beyond Euclidean? • To achieve a PTAS, two properties were assumed • Euclidean space (at least approximately) • Fixed dimension • Are both these assumptions required? • Fixed dimension is necessary • No PTAS for (log n)-dimensions unless P=NP [Trevisan’00] • Is Euclidean necessary? • Consider metric spaces with low Euclideanintrinsic dimension… Algorithmic Frontiers of Doubling Metric Spaces 6
Doubling Dimension • Definition: Ball B(x,r) = all points within distance r from x. • The doubling constant (of a metric M) is the minimum value >0such that every ball can be covered by balls of half the radius • First used by [Assoud‘83], algorithmically by [Clarkson‘97]. • The doubling dimension is ddim(M)=log (M) [Gupta-K. -Lee‘03] • M is called doubling if its doubling dimension is constant • Packing property of doubling spaces • A set with diameter D>0 and inter-point distance ≥a, contains at most (D/a)O(ddim)points Here ≤7. Algorithmic Frontiers of Doubling Metric Spaces
Applications of Doubling Dimension • Nearest neighbor search • [K.-Lee’04; HarPeled-Mendel’06; Beygelzimer-Kakade-Langford’06; Cole-Gottlieb‘06] • Spanners, routing • [Talwar’04; Kleinberg-Slivkines-Wexler’04; Abraham-Gavoille-Goldberg-Malkhi’05; Konjevod-Richa-Xia-Yu’07, Gottlieb-Roditty’08; Elkin-Solomon‘12;] • Distance oracles • [HarPeled-Mendel’06;Bartal-Gottlieb-Roditty-Kopelowitz-Lewenstein’11] • Dimension reduction • [Bartal-Recht-Schulman’11, Gottlieb-K.’11] • Machine learning and statistics • [Bshouty-Yi-Long‘09; Gottlieb-Kontorovich-K.’10,‘12; ] G H 1 2 2 1 1 1 1 Algorithmic Frontiers of Doubling Metric Spaces 8
PTAS for Metric TSP? • Does TSP on doubling metrics admit a PTAS? • Aroraand Mitchell made strong use of Euclideanproperties • “Most fascinating problem left open in this area” [James Lee, tcsmath blog, June ’10] • Some attempts • Quasi-PTAS [Talwar‘04] (First description of problem) • Quasi-PTASfor TSP w/neighborhoods [Mitchell’07; Chan-Elbassioni‘11] • Subexponential-TAS, under weaker assumption [Chan-Gupta‘08] • Our result: TSP on doubling metrics admits a PTAS • Find (1+Ɛ)-approximate tour • In time: n2O(ddim)2Ɛ-Õ(ddim) 2O(ddim2) log½n • Euclidean(to compare): n∙(log n)Ɛ-Õ(dimension) Throughout, think of ddimand εas constants Algorithmic Frontiers of Doubling Metric Spaces 9
Metric Partition • A quadtree-like hierarchy [Bartal’96, Gupta-K.-Lee’03, Talwar‘04] • At level i: • Random radii Ri2 [2i, 2·2i] • Centers are 2i-apart in arbitraryorder Algorithmic Frontiers of Doubling Metric Spaces
Metric Partition (2) • A quadtree-like hierarchy [Bartal’96, Gupta-K.-Lee’03, Talwar‘04] • Recursively to level i-1: • Caveat: log(n)hiearchical levels suffice • Ignore tiny distances < 1/n2 • Random radii Ri-12 [2i-1, 2·2i-1] Algorithmic Frontiers of Doubling Metric Spaces
Dense Areas • Key observation: • The points (metric space) can be decomposed into sparse areas • Call a level i ball “dense” if • local tour weight (i.e. inside Ri-ball) is ≥ Ri/Ɛ • Such a ball can be removed, solving each sub-problem separately • Cost to join tours is relatively small: • only Ri Algorithmic Frontiers of Doubling Metric Spaces
Sparsification • Sparse decomposition: • Search hierarchy bottom-up for dense balls. • Remove dense ball: • Ball is composed of 2O(ddim) sparse sub-balls • So it’s barely dense, i.e. local tour weight ≤ 2O(ddim)Ri-1/Ɛ • Recurse on remaining point set • But how do we know the local weight of the tour in a ball? • Can be estimated using the local MST • Modulo caveats like “long” edges… • OPTɅ B(u,R)≤ O(MST(S)) • OPT Ʌ B(u,3R)≥ Ω(MST(S)) - Ɛ-O(ddim) R Henceforth, we assume the input is sparse Algorithmic Frontiers of Doubling Metric Spaces
Light Tours • Definition: A tour is (m,r)-light on a hierarchy if it enters all cells (clusters) • At most r times, and • Only via m designated portals • Choose portals as (2i/M)–net points • Then m = MO(ddim) 2i-1/M Algorithmic Frontiers of Doubling Metric Spaces
Optimizing over Light Tours • Theorem [Arora‘98,Talwar‘04]: Given a hierarchical partition, a minimum-length (m,r)-light tour for it can be computed exactly • In time mr∙O(ddim)n∙logn • Via dynamic programming • Join tours for small clusters into tour for larger cluster Typically both m,r ≈ polylog(n/ε), thus mr ≈ npolylogn Algorithmic Frontiers of Doubling Metric Spaces
Better Partitions and Lighter Tours • Our Theorem: For every (optimal) tour T, there is a partition with an (m,r)-light tour T’ such that • M = ddim∙logn/Ɛ • m = MO(ddim) = (log n/Ɛ)Õ(ddim) • r = ε-O(ddim) loglogn • And length(T’) ≤ (1+Ɛ)∙length(T) • If the partition were known, then a tour like T’ could be found in time • mr O(ddim)n∙log n = n 2Ɛ-Õ(ddim) loglog2n • It remains to prove the Theorem, and show how to find the partition Now mr≈ poly(n) after that a bit later Algorithmic Frontiers of Doubling Metric Spaces
Constructing Light Tours • Modify a tour T to be (m,r)-light [Arora‘98, Talwar‘04] • Part I: Focus on m (i.e. net points) • Move cut edges to be incident on net points • Expected cost at one level (for edge of unit length) • Radius Ri-12i-1 • Pr[cut edge] ≤ O(ddim/Ri-1) • Expected cost ≤ (Ri-1/M)(ddim/Ri-1) = ddim/M = Ɛ/log n • Expected cost to edge over alllevels: ≤ log n ∙ Ɛ/log n = Ɛ • We thus constructed a (1+Ɛ)-approximate tour 2i-1/M Algorithmic Frontiers of Doubling Metric Spaces
Constructing Light Tours (2) • Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04] • Part II: Focus on r (i.e. number of crossing edges) • Reduce number of crossings • Patching step: Reroute (almost all) crossings back into cluster • Cost ≈ length of tour on the patched endpoints ≈ MST of these points • MST Theorem [Talwar ‘04]: For a set S of points • MST(S) ≤ diam(S)∙|S|1-1/ddim • Cost per point ≤ diam(S) / |S|1/ddim diam(S) Algorithmic Frontiers of Doubling Metric Spaces
Constructing Light Tours (3) • Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04] • Part II: Focus on r (i.e. number of crossing edges) • Reduce number of crossings • Expected cost to edge at level i-1 • Radius Ri-1 ≈ 2i-1 • Pr[edge is patched ] ≤ Pr[edge is cut ] • Expected cost ≤ (Ri-1/r1/ddim)(ddim/Ri-1) = ddim/r1/ddim • As before, want this to be ≤ Ɛ/log n (because we sum over log n levels) • Could take r = (ddim∙logn /Ɛ)ddim • But dynamic program runs in time mr QPTAS! [Talwar ‘04] 2Ri-1 Challenge: smaller value for r Algorithmic Frontiers of Doubling Metric Spaces
Patching in Sparse Areas • Suppose a tour is q-sparse with respect to hierarchy • Every R-ball contains weight qR (for all R=2i) • Expectation: Random R-ball cuts weight Rq/R = q • Cluster formed by cuts from many levels • Expectation: weight q is cut per level • If r = q∙2loglog n • Expectation: level i-1 patching includes edges cut at muchhigher levels • Charge only “top” half of patched edges • Each charged about 2Ri-1 • Pr[edge is charged for patching] ≤ Pr[edge is cut at level i+loglogn] ≤ ddim/(Ri-1 log n) Ri-1/M Algorithmic Frontiers of Doubling Metric Spaces
Wrapping Up (Patching Sparse Areas) • Modify a tour to be (m,r)-light [Arora‘98, Talwar‘04] • Part II: Focus on r (i.e. number of crossing edges) • Reduce number of crossings • Expected cost at level i-1 • Expected cost ≤ (Ri-1/r1/ddim)(ddim/Ri-1log n) = ddim/log n∙r1/ddim • As before, want this term to be equal to Ɛ/log n • Take r = (ddim/Ɛ)ddim • Obtain PTAS! 2Ri-1 Algorithmic Frontiers of Doubling Metric Spaces
Technical Subtleties • Outstanding problem: • Previous analysis assumed ball cuts only q edges • True in expectation… Not good enough • Solution: try many hierarchies • Choose at random log n radii for each ball and try all their combinations! • WHP, some hierarchy cuts q edges in every ball • Drives up runtime of dynamic program Ri-1/M Algorithmic Frontiers of Doubling Metric Spaces
Algorithmic Frontiers of Doubling Metrics Robert Krauthgamer Weizmann Institute of Science Joint work with Lee-Ad Gottlieb and Aryeh Kontorovich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA
Machine Learning in Doubling Metrics Large-margin classification in metric spaces [vonLuxburg-Bousquet’04] • Unknown distribution D of labeled points (x,y) 2 M£{-1,1} • M is a metric space (generalizes Rdim) • Labels are L-Lipschitz: |yi-yj| ≤ L∙d(xi,xj)(generalizes margin) • Resource: Sample of labeled points • Goal: Build hypothesis f:M {-1,1} that has (1-ε)-agreement with D • Statistical complexity: How many samples needed? • Computational complexity: Running time? • Extensions: • Small fraction of labels are wrong (adversarial noise) • Real-valued labels y2[-1,1] (metric regression) +1 2/L -1 2/L f Algorithmic Frontiers of Doubling Metric Spaces
Generalization Bounds • Our approach: Assume Mis doubling and use generalized VC-theory [Alon-BenDavid-CesaBianchi-Haussler’97, Bartlett-ShaweTaylor’99] • Example: Earthmover distance (EMD) in the plane between sets of size k has ddim ≤ O(k log k) • Standard algorithm: pick hypothesis that fits all/most observed samples • Theorem:Class of L-Lipschitz functions has fat-shattering dimension fsdim ≤ (c∙L∙diam(M))ddim. • Corollary: If f is L-Lipschitz and classifies n samples correctly, WHP PrD[sgn(f(x)) ≠ y] ≤ O(fsdim∙(log n)2/n). Similarly, if fcorrectly classifies all but η-fraction, then WHP PrD[sgn(f(x)) ≠ y] ≤ η + O(fsdim∙(log n)2/n)1/2. • Bounds incomparable to [vonLuxburg-Bousquet’04] Algorithmic Frontiers of Doubling Metric Spaces
Algorithmic Aspects (noise-free) • Computing a hypothesis f from the samples (xi,yi): • Where S+and S- are the positively and negatively labeled samples • Lemma (Lipschitz extension): If labels are L-Lipschitz, so is f. • Evaluatingf(x) requires solving Nearest Neighbor Search • Explains a common classification heuristic, e.g. [Cover-Hart’67] • But might require Ω(n) time… • We show how to use (1+ε)-Nearest Neighbor Search • This can be solved quickly in doubling metrics • We prove similar generalization bound by sandwiching sgn(f(x)) +1 ? -1 f Algorithmic Frontiers of Doubling Metric Spaces
Extensions (noisy case) 1. A small fraction of labels are wrong (adversarial noise) • How to compute a hypothesis? • Build a bipartite graph (on S+[S-) of all violations to Lipschitz condition (edge between two points at distance < 2/L). • Compute a minimum vertex cover (or faster: 2-approximation) 2. Real-valued labelsy2[-1,1] (metric regression) • Minimize risk (expected loss) Ex,y|f(x)-y| • Extend the statistical framework by similar ideas • But how to compute a hypothesis? • Write LP: minimize Σi |f(xi)-yi| subject to |f(xi)-f(xj)| ≤ L∙d(xi,xj)8i,j • Reduce #constraints from O(n2) to O(ε-ddimn) using (1+ε)-spanner on xi’s • Apply fast approximate LP solver Algorithmic Frontiers of Doubling Metric Spaces
Conclusion • General paradigm: low-dim. Euclidean spaces$doubling metric spaces • Mathematically– latter is different (strictly bigger) family • Not even low-distortion embeddings [Laakso’00,’01] • For algorithmic efficiency – strong analogy/similarity • E.g., nearest neighbor search, distributed computing and networking, combinatorial optimization, machine learning • Research directions: • Other computational tasks or application areas? • Particularly in machine learning, data structures • Scenarios where analogy fails? • E.g. [Indyk-Naor’05] which uses random projections • Other metric models? E.g. hyperbolic … Algorithmic Frontiers of Doubling Metric Spaces