340 likes | 498 Views
A Dependent LP-Rounding Approach for the k-Median Problem. Moses Charikar 1 Shi Li 1 1 Department of Computer Science Princeton University ICALP 2012, Warwick, UK. Outline. Introduction Linear Programming Relaxation Simple Pseudo-Approx. for k -median Our Algorithm for k -median.
E N D
A Dependent LP-Rounding Approach for the k-Median Problem Moses Charikar1Shi Li1 1Department of Computer Science Princeton University ICALP 2012, Warwick, UK
Outline • Introduction • Linear Programming Relaxation • Simple Pseudo-Approx. for k-median • Our Algorithm for k-median
k-Median as a Clustering Problem • Given: metric (X, d), k • Partition X into k clusters • Select a center for each cluster • Minimize sum of distances to the centers: • Quantifies how well a set can be divided into k partitions k = 4
k-Median in Operation Research • Given metric (F C, d), k • F : set of facilities • C : set of clients • Open k facilities • Connect each client to its nearest open facility • Minimize total connection cost k = 4
Related Problem : Facility Location Problem {fi ≥ 0 : i F} • Given metric (F C, d), k • F : set of facilities • C: set of clients • fi : facility cost of opening i • Open kfacilities • Connect each client to its nearest open facility • Minimize total connection cost k = 4 Open a set F'F of facilities Minimize sum of facility cost and connection cost,
Known Results • *local search: if switching p facilities can not improve a solution, then the solution is a 3+2/p-approx. • Integrality gap of the natural linear programming is between 2 and 3 • the proof of the upper bound 3 is non-constructive
Our Results • A LP-rounding approach for k-median • prove 3.25 approximation ratio • thus give a constructive proof for the 3.25 integrality gap • faster running time compared to the local search algorithm • potential to improve the 3+εapproximation • the upper bound 3.25 is not tight • our algorithm may already give approximation ratio smaller than 3
Our Results • k-facility location: facility location problem with constraint that at most k facilities can be open • matroid median: the set of open facilities must be an independent set of a given matroid • knapsack median problem: each facility has a cost, the total cost of open facilities can not exceed a budget B
Outline • Introduction • Linear Programming Relaxation • Simple Pseudo-Approx. for k-median • Our Algorithm for k-median
Natural LP Relaxation • yi{0,1}, iF : whether facility i is open • xi,j{0,1}, iF, jC: whether client i is connected to facility j Client j can only be connected to an open facility Every client j must be connected to 1 facility We can open at most k facilities
Canonical Instance • km facilities • every client j is connected to its nearest m facilities • in the LP solution, yi=1/m, xi,j{0,1/m} facilities clients j
Canonical Instance • Fj: the set of m facilities that j is connected to • average distance from j to Fj • maximum distance from j to Fj • LP value = facilities clients j
Outline • Introduction • Linear Programming Relaxation • Simple Pseudo-Approx. for k-median • Our Algorithm for k-median
Pseudo-Approximation • An (α, c)-pseudo approximation is a solution that opens at most αkfacilities and whose connection cost is at most c times the optimal cost • A warm-up : (1 + ε, O(1/ε))-pseudo approximation for k-median
Pseudo-Approximation • Let m' = m / (1+ε), y'i=(1+ε)yi=1/m' • Every client only needs to connect to m' facilities • We fractionally open km(1/m')=(1+ε)k facilities • Define F'j, d'av(j),d'max(j) similarly facilities clients j
Pseudo-Approximation • Two clients j and j'conflict if F'jF'j' ≠ ∅ • Select a set C' of clients such that no two clients in C' conflict each other facilities clients j j'
Pseudo-Approximation • greedily constructing C'C with no confliction • while C ≠ ∅, • select jC with the minimum dav(j) • add j to C' • remove j and all clients that conflict j from C facilities clients
Pseudo-Approximation • open facilities • For every j C', randomly open 1 of the m' facility in F'j • For any facility i that is not inside jC'F'j, open i with probability 1/m' • connect each client to its nearest open facility facilities clients Fact: every facility is open with probability 1/m'
Pseudo-Approximation Proof Enough to assume j C' • ∃j' C's.t • F'jF'j' ≠ ∅ and d'av(j') ≤ d'av(j) • E[Cj] ≤ E[Cj']+d(j, j') ≤ E[Cj']+d'max(j)+d'max(j') ≤ d'av(j')+(1/ε)d'av(j')+(1/ε)d'av(j') ≤ (1+2/ε)d'av(j) ≤ (1+2/ε)dav(j) Lemma E[Cj]≤ O(1/ε)dav(j), where Cj is the connection cost of j facilities clients j F'j j' F'j'
Outline • Introduction • Linear Programming Relaxation • Simple Pseudo-Approx. for k-median • Our Algorithm for k-median
Barrier to Obtain True Approximation • If ε=0, then F'j=Fj • dmax(j) >> dav(j) • With non-zero prob., j will be connected to facilities in Fj' • The expected connection cost of j is unbounded compared to dav(j) facilities clients j Fj j' Fj'
Remove the Barrier • Solution: j only “claims” close facilities in Fj • Let Uj be the set of claimed facilities • Use Uj to replace Fjin the algorithm • New Barrier:|Uj| < m might happen • can not guarantee always a facility open in Uj Uj j Fj
Remove the New Barrier • can guarantee |Uj| ≥ m/2 • |UjUj'| ≥ mif Uj and Uj' are disjoint • pair the clients in C' • always open 1 facility (possibly 2 facilities) in UjUj'for a matched pair (j, j') Uj Uj' j j'
Remove the New Barrier • How to open facilities for a matched pair? • m boxes in a line • Permute facilities in Uj put them in the leftmost |Uj| boxes • Permute facilities in Uj' put them in the rightmost |Uj'| boxes • Open facilities in a random selected box m Uj Uj'
The Algorithm • Filtering • 2 clients j and j'conflict if d(j, j') ≤ 4max{dav(j),dav(j')} • while C ≠ ∅ • select j C that minimizes dav(j); • add j to C' • remove j and all clients that conflict j from C
The Algorithm • Filtering • Claiming • For any j C', let 2Rj be the distance between j and its nearest neighbor in C' • A facility i is claimed by j, if • i Fj and • d(i, j) ≤ Rj i.e, Uj = Fj Ball(j, Rj) Fact: any client j C' will claim at least m/2 and at most m facilities.
The Algorithm • Filtering • Claiming • Matching • while there are at least 2 unmatched clients in C' • select 2 unmatched clients j and j' that minimizes d(j, j') • match j and j'
The Algorithm • Filtering • Claiming • Matching • Rounding • For each matched pair (j, j'), open 1 or 2 facilities in UjUj' • If there is an unmatched client j, open 0 or 1 facility in Uj • For each facility i that is not inside any Uj, open i with probability 1/m • Connect each client to its nearest open facility
Proof of Constant Approx. Ratio Proof • it is enough to assume jC' • Assume jC', there exists a client j' such that d(j', j) ≤ 4dav(j) and dav(j') ≤ dav(j) • Assume E[Cj'] ≤ αdav(j') • E[Cj]≤ d(j, j') + E[Cj'] ≤ 4dav(j)+αdav(j')≤ (4+α)dav(j) • W.L.O.G, assume dav(j) = 1 Lemma E[Cj]≤ O(1)dav(j), where Cj is the connection cost of j
Proof of Constant Approx. Ratio • There is always 1 facility open in Uj1Uj2 • Any facility in Uj1Uj2 is at most 2Rj+2Rj1+Rj2≤ 5Rjaway from j • |Uj|≥ m(1-1/Rj) • with prob. 1-1/Rj, connect to a random facility in Uj • only with prob. 1/Rj, connect to a facility that is 5Rj away • E[Cj]≤ 5 n nearest neighbor of j in C' j2 is matched with j1 Rj Rj1 Rj2 j2 j j1 Uj Uj1 Uj2 2Rj 2Rj1≤ 2Rj
Proof of 3.25 approx. ratio • complicated, details omitted • rough idea : for a client j C' • j1C' is the client that conflicts and removes j in the filtering phase • j2C'is the nearest neighbor of j1 in C' • j3C' is the client matched with j2 • Consider the nearest open facility of j in FjFj1Uj2Uj3 • Our algorithm opens k facilities in expectation • Can be easily transformed so that it always opens k facilities • Algorithm naturally extends to k-FL problem
Ongoing Work • Joint work with Svensson, improved the best approximation ratio (3+ε) for k-median
Summary • We introduced a LP-rounding algorithm for k-median problem • proved 3.25 approximation ratio for the problem • it has potential to improve the decade-long 3 approximation • Improved approximation algorithms for the following problems • k-facility location problem 3.25 • Matroid median problem 9 • Knapsack median problem 34