1 / 20

The Effectiveness of Lloyd-type Methods for the k-means Problem

The Effectiveness of Lloyd-type Methods for the k-means Problem. Chaitanya Swamy University of Waterloo Joint work with Rafi Ostrovsky, Yuval Rabani, Leonard Schulman UCLA Technion Caltech. c 3. c 2. c 1. X 2. X 3. X 1. The k-means Problem.

brygid
Download Presentation

The Effectiveness of Lloyd-type Methods for the k-means Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Effectiveness of Lloyd-type Methods for the k-means Problem Chaitanya Swamy University of Waterloo Joint work with Rafi Ostrovsky, Yuval Rabani, Leonard Schulman UCLA Technion Caltech

  2. c3 c2 c1 X2 X3 X1 The k-means Problem Given:n points in d-dimensional space • partition X into k clusters X1,…, Xk • assign each point in Xi to a common centerciÎRd Goal:Minimize∑i∑xÎXid(x,ci)2 X Í Rd: point set with |X| = n d: L2 distance

  3. k-means (contd.) • Given the ci’s, best clustering is to assign each point to nearest center: Xi = {xÎX: ci is ctr. nearest to x} • Given the Xi’s, best choice of centers is to set ci = center of mass of Xi = ctr(Xi) = ∑xÎXix / |Xi| ÞOptimal solution satisfies both properties Problem is NP-hard even fork=2 (n, d not fixed)

  4. Related Work k-means problem dates back to Steinhaus (1956). a) Approximation algorithmsº algorithms with provable guarantees • PTAS’s with varying runtime dependence on n, d, k: poly/linear in n, could be exponential in d and/or k • Matousek (poly(n), exp(d,k)) • Kumar, Sabharwal & Sen (KSS04) (lin(n,d), exp(k)) • O(1)-approximation algorithms for k-median: any point set with any metric, runtime poly(n,d,k); guarantees also translate to k-means • Charikar, Guha, Tardos & Shmoys • Arya et al. + Kanungo et al.: (9+e)-approximation

  5. b) Heuristics:Lloyd’s method invented in 1957 and remains an extremely popular heuristic even today • 1) Start with k initial / “seed” centers c1,…, ck. • 2) Iterate the following Lloyd step • Assign each point to nearest center ci to obtain clustering X1,…, Xk. • Update ci¬ ctr(Xi) = ∑xÎXi x/|Xi| .

  6. b) Heuristics:Lloyd’s method invented in 1957 and remains an extremely popular heuristic even today • 1) Start with k initial / “seed” centers c1,…, ck. • 2) Iterate the following Lloyd step • Assign each point to nearest center ci to obtain clustering X1,…, Xk. • Update ci¬ ctr(Xi) = ∑xÎXi x/|Xi| .

  7. b) Heuristics:Lloyd’s method invented in 1957 and remains an extremely popular heuristic even today • 1) Start with k initial / “seed” centers c1,…, ck. • 2) Iterate the following Lloyd step • Assign each point to nearest center ci to obtain clustering X1,…, Xk. • Update ci¬ ctr(Xi) = ∑xÎXi x/|Xi| .

  8. Lloyd’s method: What’s known? • Some bounds on number of iterations of Lloyd-type methods: Inaba-Katoh-Imai; Har-Peled-Sadri; Arthur-Vassilvitskii (’06) • Performance very sensitive to choice of seed centers; lot of literature on finding “good” seeding methods for Lloyd • But, almost no analysis that proves performance guarantees about quality of final solution for arbitrary k and dimension Our Goal: to analyze Lloyd and try to prove rigorous performance guarantees for Lloyd-type methods

  9. Our Results Main Theorem: If data has a “meaningful k-clustering”, then there is a simple, efficient seeding method s.t. Lloyd-type methods returna near-optimal solution. • Introduce a clusterability or separation condition. • Give a novel, efficient sampling process for seeding Lloyd’s method with initial centers. • Show that if data satisfies our clusterabililty condition: • seeding + 1 Lloyd step yields a constant-approximation in time linear in n and d, poly(k): is potentially faster than Lloyd variants which require multiple reseedings • seeding + KSS04-sampling gives a PTAS: algorithm is faster and simpler than PTAS in KSS04.

  10. “Meaningful k-Clustering” Settings where one would NOT consider data to possess a meaningful k-clustering: 1) If near-optimum cost can be achieved by two very distinct k-partitions of data, then identity of an optimal k-partition carries little meaning – provides ambiguous classification. 2) If cost of best k-clustering ≈ cost of best (k-1)-clustering, then a k-clustering yields only marginal benefit over the best (k-1)-clustering – should use smaller value of k here. Example: k=3

  11. We formalize 2). Let Dk2(X) = cost of best k-clustering of X. X is e-seperated for k-means iff Dk2(X) / Dk-12(X) ≤ e2. • Simple condition. Drop in k-clustering cost is already used by practitioners to choose the right k • Can show that (roughly), two low-cost k-clusterings disagree on only a small fraction of data X is e-separated for k-means Û

  12. The 2-means problem (k=2) X is e-separated for 2-means. X*1, X*2 : optimal clusters c*i = ctr(X*i), D* = d(c*1,c*2) ni = |X*i|, (r*i)2 = ∑xÎX*id(x,c*i)2 /ni = D12(X*i)/ni = avg. squared distance in cluster X*i r*1 r*2 c*1 c*2 D* X*2 X*1 Lemma: For i=1, 2, (r*i)2 ≤ e2/(1-e2).D*2 . Proof:D22(X)/e2 ≤ D12(X) = D22(X) + (n1n2 /n).D*2.

  13. The 2-means algorithm c*1 c*2 D* X*2 X*1 • 1)Sampling-based seeding procedure: • Pick two seed centers c1, c2 by randomly picking the pair x, yÎX with probability d(x,y)2. • 2)Lloyd step or simpler “ball k-means step”: • For each ci, let Bi = {xÎX: d(x,ci) ≤ d(c1,c2)/3}. • Update ci¬ ctr(Bi); return these as final centers. Sampling can be implemented in O(nd) time, so entire algorithm runs in O(nd) time. Assume: X is e-separated for 2-means

  14. 2-means: Analysis c1 c2 c*1 c*2 D* X*2 X*1 core(X*2) core(X*1) Let core(X*i) = {xÎX*i : d(x,c)2≤ (r*i)2/r}, where r=q(e2) <1. Seeding lemma: With prob. 1–O(r), c1,c2 lie in cores of X*1, X*2. Proof:|core(X*i)| ≥ (1-r)ni for i=1,2. Let A = ∑xÎcore(X*1), yÎcore(X*2) d(x,y)2≈ (1-r)2n1n2D*2. B = ∑{x,y}ÍX d(x,y)2= n.D12(X) ≈ n1n2D*2. Probability = A/B ≈ (1-r)2= 1– O(r).

  15. 2-means analysis (contd.) B1 B2 c1 c2 c*1 c*2 D* X*2 X*1 core(X*2) core(X*1) Recall that Bi = {xÎX: d(x,ci) ≤ d(c1,c2)/3} Ball-k-means lemma: For i=1,2, core(X*i) Í BiÍ X*i. Therefore d(ctr(Bi),c*i)2 ≤ r(r*i)2/(1–r) . Intuitively, since Bi Í X*i and Bi contains almost all of the mass of X*i, ctr(Bi) must be close to ctr(X*i) = c*i.

  16. 2-means analysis (contd.) Theorem: With probability 1–O(r), cost of final clustering is at mostD22(X)/(1–r), Þ get a (1/(1–r))-approximation algorithm. Since r = O(e2), we have approximation ratio ®1 as e ® 0. probability of success ®1 as e ® 0.

  17. Arbitrary k • Algorithm and analysis follow the same outline as in 2-means. • If X is e-separated for k-means, can again show that all clusters are well separated, that is, • cluster radius << inter-cluster distance, r*i = O(e).d(c*i,c*j) "i,j • 1)Seeding stage: we choose k initial centers and ensure that they lie in the “cores” of the k optimal clusters. • exploits the fact that clusters are well separated • after seeding stage, each optimal center has a distinct seed center very “near” it • 2) Now, can run either a Lloyd step or a ball-k-means step. • Theorem: If X is e-separated for k-means, then one can obtain an a(e)-approximation algorithm where a(e) ®1 as e® 0.

  18. Schematic of entire algorithm Ball k-means or Lloyd step: gives O(1)-approx. Simple sampling: success probability = exp(-k) k well-placed seeds Greedy deletion: O(n3d) KSS04-sampling: gives PTAS Oversampling + deletion: sample O(k) centers, then greedily delete till k remain O(1) success probability, O(nkd+k3d) • Simple sampling: Pick k centers as follows. • first pick 2 centers c1, c2 as in 2-means • to pick center ci+1, pick xÎX with probability minj≤i d(x,cj)2 Greedy deletion: Start with n centers and keep deleting the center that causes least cost increase till k centers remain

  19. Open Questions • Deeper analysis of Lloyd: are there weaker conditions under which one can prove performance guarantes for Lloyd-type methods? • PTAS for k-means with polytime dependence on n, k and d? Is it APX hard in geometric setting? • PTAS for k-means under our separation condition? • Other applications of separation condition?

  20. Thank You.

More Related