1 / 39

Streaming Algorithms for Geometric Problems

Streaming Algorithms for Geometric Problems. Piotr Indyk MIT. Streaming: The Model. Single pass over the data: e 1 , e 2 , …, e n Bounded storage Fast processing time per element. Recap: Norm estimation. Norm estimation: Stream elements: (i,b) , i=1…m Interpretation: x i =x i +b

nemo
Download Presentation

Streaming Algorithms for Geometric Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Streaming Algorithms for Geometric Problems Piotr Indyk MIT

  2. Streaming: The Model • Single pass over the data: e1, e2, …,en • Bounded storage • Fast processing time per element

  3. Recap: Norm estimation • Norm estimation: • Stream elements: (i,b) , i=1…m • Interpretation: xi=xi+b • Want to (1+)-approximate||x||p • Note: the frequency moment Fp is a special case • Algorithms with (log n+1/)O(1) space: • p=2: [Alon-Matias-Szegedy’96] • p(0,2]: [Indyk’00] • Idea: maintain Ax

  4. Recap: Other algorithms • Compute the number of distinct elements F0 • Exactly: (n) bits of space • (1+) -approximation: O(1/2 *log n) bits [Flajolet-Martin, JCSS’85] ,… • Compute the median • Exactly: (n) • (50%  ) -approximation: O(1/ *polylog n)[Paterson-Munro, TCS’80] ,…

  5. Geometric Data Stream Algorithms as Data Structures • Data structures that support: • Insert(p) to P • Possibly: Delete(p) from P • Compute-Some-Property(P) • Use space that is sub-linear in |P|

  6. Dichotomy • Insertions + Deletions • Randomized linear • mappings [FM’85], • [AMS’96] • Randomized • Insertions • Merge and Reduce • [MP’80], • core-sets • Deterministic

  7. Insertions and Deletions

  8. Minimum Weight Bi-chromatic Matching • Estimate the cost of MWBM

  9. Minimum Weight Matching • Estimate the cost of MWM

  10. Minimum Spanning Tree • Estimate the cost of MST

  11. Facility Location • Goal: choose a set F of facilities to minimize the • sum of the distances to nearest facility plus • the number of facilities times f • Again, report the cost

  12. Approach • Assume P{1…}2 • Reduce to vector problems • Impose square grids G0…Gk, with side lengths 20,21, …, 2k, shifted at random. • For each square cell c in Gi, let nP(c) be the number of points from P in c. • The algorithms will maintain certain statistics over nP(.), which will allow it to approximately solve the problems 1 2 1 3 1 5 1 1

  13. Estimators • MST: ∑i 2i ∑c Gi [nP(c)>0] • MWM: ∑i 2i∑c Gi [nP(c) is odd] • MWBM: ∑i 2i ∑c Gi |nG(c)-nB(c)| • Fac. Loc.: ∑i 2i∑c Gi min[nP(c), Ti] • K-median: ∑i 2i∑c Gi - B(Q, 2^i)nP(c) (const. factor) Maintain #non-zero entries in nP[FM’85] Maintain L1 difference [I’00]

  14. Results [Indyk’04]  1+ [Frahling-Indyk-Sohler’05] Space: (log  +log n)O(1) *follows from [Charikar’02,Indyk-Thaper’03]

  15. Probabilistic embeddings into HST’s T 1 2 1 3 1 5 1 1 • Known[Bartal’96, Charikar-Chekuri-Goel-Guha-Plotkin’98]: • ||p-q|| ≤ Dtree (p,q) • E[ Dtree(p,q) ] ≤ ||p-q|| * O(log )

  16. MST • E[Cost(MST in T)] ≤ O(log ) Cost(MST) • Cost(MST in T)  Cost(T) • How to compute Cost(T) ? • Sum over all levels i, of the #nodes at i, times 2i • Node c exists iff ni(c)>0 1 2 1 3 1 5 1 1

  17. Matching • Algorithm: • Match what you can at the current level • Odd leftovers wait for the next level • Repeat • Optimal on the HST • Cost=∑i 2i ∑c Gi [nP(c) is odd] 1 0 1 1 1 0 1 1 0

  18. Insertions-only

  19. k-median/k-center • k is given • Goal: choose k medians/centers to minimize: • k-median: the sum of the distances • k-center: the max distance

  20. Metric clustering problems • k-center [Charikar-Chekuri-Feder-Motwani’97] • k-median [Guha-Mishra-Motwani-O’Callaghan’00, Meyerson’01, Charikar-O’Callaghan-Panigrahy’03] • Bounds: • Poly(K,log n) space • O(1)-approximation

  21. Geometric Problems • Diameter, Minimum Enclosing Ball [Agarwal-Har-Peled’01, Feigenbaum-Kannan-Zhang’02 (Algorithmica), Hershberger-Suri’04] • K-center [AHP’01] • K-median [Har-Peled-Mazumdar’04] • Range searching via -approximations: • [Suri-Toth-Zhou’04] • [Bagchi-Chaudhary-Eppstein-Goodrich’04]

  22. Dominant Approach: Merge and Reduce • Main ideas: • Design an (off-line) algorithm that computes a “sketch” of the input • Small size • Sufficient to solve the problem • A sketch of sketches is a sketch

  23. Tree Computation p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16

  24. Algorithm • Space: (sketch size)*log n • Time: sketch computation time • Question: Where do sketches come from ?

  25. Idea I: solution=sketch • Consider k-median • [Guha-Mishra-Motwani-O’Callaghan’00] : approximate k-median of approximate weighted k-medians is an approximate k-median • Result: • Constant depth tree • Space: kn, >0 • O(1) -approximation • Works for any metric space 3 2 1 3 2 1 k=3

  26. Use the solution, ctd. • -Approximations: find a subset SP , such that for any rectangle/halfspace/etc R, |RS|/|S|=|RP|/|P| • [Matousek] : approximation of a union of approximations is an approximation • [BCEG’04] : convert it into streaming algorithm, applications • 1/2space • [STZ’04] : better/optimal bounds for rectangles and halfspaces

  27. Idea 2: Core-Sets [AHP’01] • Assume we want to minimize CP(o) • SP is an -core-set for P, if for any o, and a set T: CPT (o) < (1+) CST (o) • Note: this must hold for all o, not just the optimal one • The latter is called a “weak coreset” [Badoiu-HarPeled-Indyk’02] o

  28. Example: Core-set for MEB • Compute extremal points: • Choose “densely” spaced direction v1 …vk • I.e., for any u there is vi such that u*vi ≥ ||u||2 / (1+) • For each direction maintain extremal point • k=O(1/)(d-1)/2suffice

  29. Stream Algorithms via Core-sets • Diameter/MEB/width: O(1/)(d-1)/2 log n space [AHPV’01+02+04] • k-center: O(k/d) log n [HP’01] • k-median: O(k/d) log n [HPM’04] (k+d+log n+1/)O(1) [Chen’06] • Faster algorithms and other results: [Chan’04], [Suri-Hershberger’03]

  30. Core-sets + Randomized Maps • (1+)-approximate k-median under insertions and deletions [Frahling-Sohler’05]

  31. Insertions and Deletions, ctd

  32. Histograms • View x as a function x:[1…n]  [1…M] • Approximate it using piecewise constant function h, with B pieces (buckets) • Problem can be formulated in 2D as well (buckets become rectangular tiles)

  33. Results: 1D • [Gilbert-Guha-Indyk-Kotidis-Muthukrishnan-Strauss, STOC’02] : • Maintains h with B pieces such that ||x-h||2 ≤ (1+)||x-hOPT||2 • Under increments/decrements of x • Space: poly(B,1/,log n) • Time: poly(B,1/,log n)

  34. Results: 2D • [Thaper-Guha-Indyk-Koudas, SIGMOD’02] : • Maintains h with Blog (nM) tiles such that ||x-h||2 ≤ (1+)||x-hOPT||2 • Under increments/decrements of x • Space/Update time: poly(B,1/,log n) • Histogram reconstruction time: poly(B,1/, n) • [Muthukrishnan-Strauss, FSTTCS’03] : • Maintains h with 4B tiles • Time: poly(B,1/, log(nM))

  35. General Approach • Maintain sketches Ax of x • This allows us to estimate the error of any given h, via ||x-h||  ||Ax-Ah|| • Construct h: • Enumeration • Greedy • Dynamic Programming

  36. Conclusions • Algorithms for geometric data streams • Insertions-only: merge and reduce • Insertions and deletions: randomized linear embeddings

  37. Open Problems • High dimensions: • Diameter: • 21/2-approx, O(d2 n1/2 ) space, follows from [Goel-Indyk-Varadarajan, SODA’01] • c-approx, O( dn1/(c2 - 1) )[Indyk, SODA’03] • Conjecture: 21/2-approx, O(d polylog n) space • Min-width cylinder: 18-approx, O(d) space [Chan’04] • Other problems ?

  38. Open Problems • Range queries: • General lower bounds ? (Not just for - approximations) • (1/2) -bit bound for general queries follows from LB for dot product [Indyk-Woodruff, FOCS’03] , and is tight (for randomized algorithms) • What about e.g., half-space queries ? O(1/4/3) is known [STZ’04] • Other problems [STZ’04]

  39. Open Problems • Matchings, Facility Location, etc: • Replace log  by O(1) or even 1+ • Possible for MST [Frahling-Indyk-Sohler’??] • Related to computing bi-chromatic matching [Agarwal-Varadarajan’04] • Min-sum clustering ?

More Related