1 / 37

Improved approximation for k -median

Improved approximation for k -median. Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540. 04/20/2013. transportation cost. maintenance cost. minimize. +. $20. $10. $50. $100. $130. $30. $30. Facility Location Problem.

Download Presentation

Improved approximation for k -median

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

  2. transportation cost maintenance cost minimize + $20 $10 $50 $100 $130 $30 $30

  3. Facility Location Problem BALINSKI, M. L.1966. On finding integer solutions to linear programs. In Proceedings of the IBM Scientific Computing Symposium on Combinatorial Problems. IBM, New York, pp. 225–248. KUEHN, A. A., AND HAMBURGER, M. J. 1963. A heuristic program for locating warehouses. STOLLSTEIMER, J. F.1961. The effect of technical change and output expansion on the optimum number, size and location of pear marketing facilities in a California pear producing region. Ph.D. thesis, Univ. California at Berkeley, Berkeley, Calif. STOLLSTEIMER, J. F.1963. A working model for plant numbers and locations. J. Farm Econom. 45, 631– 645.

  4. Uncapacitated Facility Location (UFL) facilities clients F : potential facility locations C : set of clients fi, iF : cost for opening i d : metric over F C find S F, minimize $100 $100 $30 $20 + $100 $100 connection cost facility cost

  5. Wal-martStores in New Jersey Question : Suppose you have budget for 50 stores, how will you select 50 locations?

  6. k-median facilities clients F : potential facility locations C : set of clients d : metric over F C find S F, minimize k : number of facilities to open fi, iF : cost for opening i |S |= k +

  7. k-median clustering

  8. Known Results: UFL • O(log n)-approximation [Hoc82] • constant approximations • 3.16 [STA98] • 2.41 [GK99] • 3 [JV99] • 1.853 [CG99] • 1.728 [CG99] • 5+ε [Kor00] • 1.861 [MMSV01] • 1.736 [CS03] • 1.61 [JMS02] • 1.582 [Svi02] • 1.52 [MYZ02] • 1.50 [Byr07] • 1.488 [Li11] • 1.463-hardness of approx. [GK98]

  9. Deterministic rounding of linear programs • 4.5 The uncapacitated facility location problem • Random sampling and randomized rounding of linear programs • 5.8 The uncapacitated facility location problem • The primal-dual method • 7.6 The uncapacitated facility location problem • Further uses of greedy and local search algorithms • 9.1 A local search algorithm for the uncapacitated facility locationproblem • 9.4 A greedy algorithm for the uncapacitated facility locationproblem • 12 Further uses of random sampling and randomized rounding of linear programmings • 12.1 The uncapacitated facility locationproblem

  10. Know results : k-median • pseudo-approximation • 1-approx with O(k log n) facilities [Hoc82] • 2(1+ε)-approx. with (1+1/ε)k facilities[LV92] • super-constant approximation • O(log nloglogn) [Bar96,Bar98] • O(log kloglogk) [CCGS98]

  11. Known Results: k-median • constant approximation Local Search LP rounding Primal-Dual 6.667 [CGTS99] 6 [JV99] 3+ε [AGK+01] 3.25 [CL12] 4 [JMS03] 4 [CG99] 1+√3+ε[LS13] • (1+2/e)-hardness of approximation [JMS03]

  12. Lloyd Algorithm[Lloyd82] • k-means clustering : min total squared distances • k-means vsk-median • clustering: k-means is more often used • Walmart example: k-median is more appropriate • approximation: k-median is “easier”

  13. Local Search • Can we improve the solution by p swaps? • No : stop • Yes : swap and repeat • Approximation : • k-median : 3+2/p[AGK+01] • k-means : (3+2/p)2[KMN+02]

  14. LP for k-median yi : whether to open i xi,j : whether connect j to i integrality gap is at least 2 integrality gap is at most 3 (proof non-constructive) open at most k facilities client j must be connected client j can only connected to an open facility

  15. (1+√3+ε)-approximation on k-median

  16. k-median and UFL • f = cost of a facility • f #open facilities Given a black-box α-approximation A for UFL Naïve try : find an f such that A opens k facilities α-approxition for k-median? Proof : α ≈1.488 for UFL, α > 1.736 for k-median

  17. k-median and UFL Naïve try : find an f such that A opens k facilities 2 issues with naïve try : 1.need LMPα-approximation for UFL α-approximation: LMP α-approximation LMP = Lagragean Multiplier Preserving

  18. k-median and UFL Naïve try : find an f such that A opens k facilities 2 issues with naïve try : 1.need LMPα-approximation for UFL 2. can not find fs.t.A opens exactlyk facilities S1: set ofk1 < k facilities S2 : set of k2 > k facilities bi-point solution

  19. k-median and UFL 2 issues with naïve try : 1.need LMPα-approximation for UFL 2. can not find fs.t.A opens exactlyk facilities [JV] our result [JMS] LMP approx. factor 2 2 3 x 2 x 2 bi-point  integral 4 6 final ratio for k-median do not know how to improve this factor of 2 is tight !!

  20. bi-point solution k1= |S1| < k≤ |S2| = k2 a, b : ak1 + bk2 = k, a + b = 1 bi-point solution : aS1+bS2 S1 S2 cost(aS1+bS2) = a cost(S1) + b cost(S2)

  21. gap-2 instance cost of integral solution = 2 k1= 1, k2 = k+1 cost(S1) = k+1, cost(S2) = 0 0 1 S1 S2 k + 1

  22. k-median and UFL bi-point  pseudo-integral Main Lemma 2 : bi-point solution of cost C  solution of cost withk+O(1/ε)facilities this factor of 2 is tight !! Main Lemma 1 : suffice to give an α-approximate solution with k+O(1) facilities

  23. Main Lemma 1 A : black-box α-approximation with k+c open facilities A' : (α+ε)-approximation with k open facilities A' calls AnO(c/ε) times. with k+1 open facilities, cost = 0 with k open facilities , cost huge bad instance:

  24. Dense Facility Bi : set of clients in a small ball around i i is A-dense, if connection cost of Bi in OPT is ≥ A this instance : iis A-dense for A≈opt i Bi

  25. Dense Facility Reduction component works directly if there are no opt/t-dense facilities, t= O(c/ε) can reduce to such an instance in nO(t) time i Bi

  26. Lemma 1 from [ABS] Main Lemma 1 : suffice to give an α-approximate solution with k+O(1) facilities • k-median clustering is easy in practice • reason : there is a “meaningful” clustering [Awasthi-Blum-Sheffet] : ε,δ >0 constants, OPTk-1≥ (1+δ)OPTk can find (1+ε)-approximation

  27. Lemma 1 from [ABS] [ABS] OPTk-1≥ (1+δ)OPTk (1+ε)-approximation A : α-approximation algorithm for k-median with k+c medians • Algorithm • Apply A to (k-c, F, C, d)  solution with k facilities of cost ≤ αOPTk-c • Apply [ABS] to each (k-i, F, C, d) for i= 0, 1, 2, …, c-1 • Output the best of the c+1 solutions • Proof • If OPTk-c≤ (1+ε)OPTk, then done. • otherwise, consider the smallest is.t. OPTk-i-1 ≥ (1+ε)1/cOPTk-i • [ABS] on (k-i, F, C, d)  solution of cost (1+ε)OPTk-i≤ (1+ε)2OPTk

  28. [JV] bi-point solution of cost C solution of cost 2C • based on improving [JV] algorithm Main Lemma 2 : bi-point solution of cost C  solution of cost withk+O(1/ε)facilities

  29. JV algorithm S1 S2 τi= nearest facility of i i given : bi-point solution aS1+bS2 select S’2  S2,|S’2| = |S1| = k1 with prob. a, open S1 with prob. b, open S’2 randomly open k-k1 facilities in S2 \ S’2 guarantee : either i is open, or τiis open

  30. Analysis of JV algorithm i1 S1 , i3  S’2 either i1 or i3 is open i2 d1 d2 j i1 i3 ≤ d1+d2 If i2 is open, connect j to i2 Otherwise, if i1 is open, connect j to i1 Otherwise connect j to i3 E[cost of j] ≤ × [cost of j inaS1+bS2] 2

  31. Our Algorithm ≤ d1+d2 on average, d1 >> d2 d(j, i3) ≤ i3 i2 d1 d2 d1+2d2 j 2d1+d2 i1 i3 ≤ d1+d2 If i2 is open, connect j to i2 Otherwise, if i1 is open, connect j to i1 Otherwise connect j to i3 E[cost of j] ≤ × [cost of j inaS1+bS2] 2

  32. Our Algorithm need to guarantee : either i is open, or τi is open for a star, either the center is open, or all leaves are open τi i • first try • open each star independently? • with prob. a, open the center, • with prob. b, open the leaves • problem : can not bound the number of open facilities • idea : • big stars: always open the center, open each leaf with prob. ≈b • group small stars of the same size, dependent rounding • for each group, open 3 more facilities than expected

  33. small stars small star : star of size ≤ 2/(abε) Mh : set of stars of size h, m = |Mh| Roughly, for am stars, open the center for bm stars, open the leaves More accurately, permute the stars and the facilities open top centers open bottom leaves

  34. big stars size h >2/(abε) always open the center randomly open leaves ≈ bh for big star

  35. Lemma : we open at most k + 6/(abε) facilities. for a big star of size h, FRAC : a+bh ALG : fora group of m small stars of size h FRAC : m(a+bh) ALG : there are at most 2/(abε) groups

  36. Summary bi-point  pseudo-integral Main Lemma 2 : bi-point solution of cost C  solution of cost withk+O(1/ε)facilities Main Lemma 1 : suffice to give an α-approximate solution with k+O(1) facilities

  37. Open Problems • gap between integral solution with k+1 open facilities and LP value(with k open facilities)? • tight analysis? • algorithm works for k-means? Thank you! Questions?

More Related