1 / 34

A Dependent LP-Rounding Approach for the k-Median Problem

A Dependent LP-Rounding Approach for the k-Median Problem. Moses Charikar 1 Shi Li 1 1 Department of Computer Science Princeton University ICALP 2012, Warwick, UK. Outline. Introduction Linear Programming Relaxation Simple Pseudo-Approx. for k -median Our Algorithm for k -median.

ray
Download Presentation

A Dependent LP-Rounding Approach for the k-Median Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Dependent LP-Rounding Approach for the k-Median Problem Moses Charikar1Shi Li1 1Department of Computer Science Princeton University ICALP 2012, Warwick, UK

  2. Outline • Introduction • Linear Programming Relaxation • Simple Pseudo-Approx. for k-median • Our Algorithm for k-median

  3. k-Median as a Clustering Problem • Given: metric (X, d), k • Partition X into k clusters • Select a center for each cluster • Minimize sum of distances to the centers: • Quantifies how well a set can be divided into k partitions k = 4

  4. k-Median in Operation Research • Given metric (F C, d), k • F : set of facilities • C : set of clients • Open k facilities • Connect each client to its nearest open facility • Minimize total connection cost k = 4

  5. Related Problem : Facility Location Problem {fi ≥ 0 : i F} • Given metric (F C, d), k • F : set of facilities • C: set of clients • fi : facility cost of opening i • Open kfacilities • Connect each client to its nearest open facility • Minimize total connection cost k = 4 Open a set F'F of facilities Minimize sum of facility cost and connection cost,

  6. Known Results • *local search: if switching p facilities can not improve a solution, then the solution is a 3+2/p-approx. • Integrality gap of the natural linear programming is between 2 and 3 • the proof of the upper bound 3 is non-constructive

  7. Our Results • A LP-rounding approach for k-median • prove 3.25 approximation ratio • thus give a constructive proof for the 3.25 integrality gap • faster running time compared to the local search algorithm • potential to improve the 3+εapproximation • the upper bound 3.25 is not tight • our algorithm may already give approximation ratio smaller than 3

  8. Our Results • k-facility location: facility location problem with constraint that at most k facilities can be open • matroid median: the set of open facilities must be an independent set of a given matroid • knapsack median problem: each facility has a cost, the total cost of open facilities can not exceed a budget B

  9. Outline • Introduction • Linear Programming Relaxation • Simple Pseudo-Approx. for k-median • Our Algorithm for k-median

  10. Natural LP Relaxation • yi{0,1}, iF : whether facility i is open • xi,j{0,1}, iF, jC: whether client i is connected to facility j Client j can only be connected to an open facility Every client j must be connected to 1 facility We can open at most k facilities

  11. Canonical Instance • km facilities • every client j is connected to its nearest m facilities • in the LP solution, yi=1/m, xi,j{0,1/m} facilities clients j

  12. Canonical Instance • Fj: the set of m facilities that j is connected to • average distance from j to Fj • maximum distance from j to Fj • LP value = facilities clients j

  13. Outline • Introduction • Linear Programming Relaxation • Simple Pseudo-Approx. for k-median • Our Algorithm for k-median

  14. Pseudo-Approximation • An (α, c)-pseudo approximation is a solution that opens at most αkfacilities and whose connection cost is at most c times the optimal cost • A warm-up : (1 + ε, O(1/ε))-pseudo approximation for k-median

  15. Pseudo-Approximation • Let m' = m / (1+ε), y'i=(1+ε)yi=1/m' • Every client only needs to connect to m' facilities • We fractionally open km(1/m')=(1+ε)k facilities • Define F'j, d'av(j),d'max(j) similarly facilities clients j

  16. Pseudo-Approximation • Two clients j and j'conflict if F'jF'j' ≠ ∅ • Select a set C' of clients such that no two clients in C' conflict each other facilities clients j j'

  17. Pseudo-Approximation • greedily constructing C'C with no confliction • while C ≠ ∅, • select jC with the minimum dav(j) • add j to C' • remove j and all clients that conflict j from C facilities clients

  18. Pseudo-Approximation • open facilities • For every j  C', randomly open 1 of the m' facility in F'j • For any facility i that is not inside jC'F'j, open i with probability 1/m' • connect each client to its nearest open facility facilities clients Fact: every facility is open with probability 1/m'

  19. Pseudo-Approximation Proof Enough to assume j  C' • ∃j' C's.t • F'jF'j' ≠ ∅ and d'av(j') ≤ d'av(j) • E[Cj] ≤ E[Cj']+d(j, j') ≤ E[Cj']+d'max(j)+d'max(j') ≤ d'av(j')+(1/ε)d'av(j')+(1/ε)d'av(j') ≤ (1+2/ε)d'av(j) ≤ (1+2/ε)dav(j) Lemma E[Cj]≤ O(1/ε)dav(j), where Cj is the connection cost of j facilities clients j F'j j' F'j'

  20. Outline • Introduction • Linear Programming Relaxation • Simple Pseudo-Approx. for k-median • Our Algorithm for k-median

  21. Barrier to Obtain True Approximation • If ε=0, then F'j=Fj • dmax(j) >> dav(j) • With non-zero prob., j will be connected to facilities in Fj' • The expected connection cost of j is unbounded compared to dav(j) facilities clients j Fj j' Fj'

  22. Remove the Barrier • Solution: j only “claims” close facilities in Fj • Let Uj be the set of claimed facilities • Use Uj to replace Fjin the algorithm • New Barrier:|Uj| < m might happen • can not guarantee always a facility open in Uj Uj j Fj

  23. Remove the New Barrier • can guarantee |Uj| ≥ m/2 • |UjUj'| ≥ mif Uj and Uj' are disjoint • pair the clients in C' • always open 1 facility (possibly 2 facilities) in UjUj'for a matched pair (j, j') Uj Uj' j j'

  24. Remove the New Barrier • How to open facilities for a matched pair? • m boxes in a line • Permute facilities in Uj put them in the leftmost |Uj| boxes • Permute facilities in Uj' put them in the rightmost |Uj'| boxes • Open facilities in a random selected box m Uj Uj'

  25. The Algorithm • Filtering • 2 clients j and j'conflict if d(j, j') ≤ 4max{dav(j),dav(j')} • while C ≠ ∅ • select j  C that minimizes dav(j); • add j to C' • remove j and all clients that conflict j from C

  26. The Algorithm • Filtering • Claiming • For any j  C', let 2Rj be the distance between j and its nearest neighbor in C' • A facility i is claimed by j, if • i Fj and • d(i, j) ≤ Rj i.e, Uj = Fj  Ball(j, Rj) Fact: any client j  C' will claim at least m/2 and at most m facilities.

  27. The Algorithm • Filtering • Claiming • Matching • while there are at least 2 unmatched clients in C' • select 2 unmatched clients j and j' that minimizes d(j, j') • match j and j'

  28. The Algorithm • Filtering • Claiming • Matching • Rounding • For each matched pair (j, j'), open 1 or 2 facilities in UjUj' • If there is an unmatched client j, open 0 or 1 facility in Uj • For each facility i that is not inside any Uj, open i with probability 1/m • Connect each client to its nearest open facility

  29. Proof of Constant Approx. Ratio Proof • it is enough to assume jC' • Assume jC', there exists a client j' such that d(j', j) ≤ 4dav(j) and dav(j') ≤ dav(j) • Assume E[Cj'] ≤ αdav(j') • E[Cj]≤ d(j, j') + E[Cj'] ≤ 4dav(j)+αdav(j')≤ (4+α)dav(j) • W.L.O.G, assume dav(j) = 1 Lemma E[Cj]≤ O(1)dav(j), where Cj is the connection cost of j

  30. Proof of Constant Approx. Ratio • There is always 1 facility open in Uj1Uj2 • Any facility in Uj1Uj2 is at most 2Rj+2Rj1+Rj2≤ 5Rjaway from j • |Uj|≥ m(1-1/Rj) • with prob. 1-1/Rj, connect to a random facility in Uj • only with prob. 1/Rj, connect to a facility that is 5Rj away • E[Cj]≤ 5 n nearest neighbor of j in C' j2 is matched with j1 Rj Rj1 Rj2 j2 j j1 Uj Uj1 Uj2 2Rj 2Rj1≤ 2Rj

  31. Proof of 3.25 approx. ratio • complicated, details omitted • rough idea : for a client j  C' • j1C' is the client that conflicts and removes j in the filtering phase • j2C'is the nearest neighbor of j1 in C' • j3C' is the client matched with j2 • Consider the nearest open facility of j in FjFj1Uj2Uj3 • Our algorithm opens k facilities in expectation • Can be easily transformed so that it always opens k facilities • Algorithm naturally extends to k-FL problem

  32. Ongoing Work • Joint work with Svensson, improved the best approximation ratio (3+ε) for k-median

  33. Summary • We introduced a LP-rounding algorithm for k-median problem • proved 3.25 approximation ratio for the problem • it has potential to improve the decade-long 3 approximation • Improved approximation algorithms for the following problems • k-facility location problem 3.25 • Matroid median problem 9 • Knapsack median problem 34

  34. Thanks

More Related