1 / 79

Efficiency of Random Swap Clustering for Optimizing Centroids in K-Means Algorithm

Explore the efficiency of a random swap algorithm for adjusting centroids in the K-means clustering method. The algorithm iterates through random swaps, reallocates vectors, and creates new clusters to improve clustering. We analyze time complexity, iterations, and swap types, using examples from image quantization and theoretical estimations.

cmoodie
Download Presentation

Efficiency of Random Swap Clustering for Optimizing Centroids in K-Means Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pasi Fränti Random Swap algorithm 24.4.2018 P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 2018.

  2. Definitions and data Set of N data points: X={x1, x2, …, xN} Partition of the data: P={p1, p2, …, pk}, Set of k cluster prototypes (centroids): C={c1, c2, …, ck},

  3. Clustering problem Objective function: Optimality of partition: Optimality of centroid:

  4. K-means algorithm X = Data set C = Cluster centroids P = Partition K-Means(X, C) → (C, P) REPEAT Cprev←C; FOR i=1 TO N DOpi← FindNearest(xi, C); FOR j=1 TO k DO cj← Average of xipi = j; UNTIL C = Cprev Optimal partition Optimal centoids

  5. Problems of k-means

  6. Swapping strategy

  7. Pigeon hole principle CI = Centroid index: P. Fränti, M. Rezaei and Q. Zhao "Centroid index: cluster level similarity measure” Pattern Recognition, 47 (9), 3034-3045, September 2014, 2014.

  8. Pigeon hole principle S2

  9. Aim of the swap S2

  10. Random Swap algorithm

  11. Steps of the swap 1. Random swap: 2. Re-allocate vectors from old cluster: 3. Create new cluster:

  12. Swap

  13. Local re-partition Re-allocate vectors Create new cluster

  14. Iterate by k-means 1st iteration

  15. Iterate by k-means 2nd iteration

  16. Iterate by k-means 3rd iteration

  17. Iterate by k-means 16th iteration

  18. Iterate by k-means 17th iteration

  19. Iterate by k-means 18th iteration

  20. Iterate by k-means 19th iteration

  21. Final result 25 iterations

  22. Extreme example

  23. Dependency on initial solution

  24. Data sets

  25. Data sets

  26. Data sets visualizedImages House Bridge Miss America Europe RGB color 4x4 blocks 4x4 blocks frame differential Differential coordinates 16-d 16-d 3-d 2-d

  27. Data sets visualizedArtificial

  28. Data sets visualizedArtificial G2-2-30 G2-2-50 G2-2-70 A1 A2 A3

  29. Time complexity

  30. Efficiency of the random swap Total time to find correct clustering: Time per iteration  Number of iterations Time complexity of single iteration: Swap: O(1) Remove cluster: 2kN/k = O(N) Add cluster: 2N = O(N) Centroids: 2N/k + 2N/k + 2 = O(N/k) K-means: IkN = O(IkN) Bottleneck!

  31. Efficiency of the random swap Total time to find correct clustering: Time per iteration  Number of iterations Time complexity of single iteration: Swap: O(1) Remove cluster: 2kN/k = O(N) Add cluster: 2N = O(N) Centroids: 2N/k + 2N/k + 2 = O(N/k) (Fast) K-means: 4N = O(N) 2 iterations only! T. Kaukoranta, P. Fränti and O. Nevalainen "A fast exact GLA based on code vector activity detection" IEEE Trans. on Image Processing, 9 (8), 1337-1342, August 2000.

  32. Estimated and observed steps Bridge N=4096, k=256, N/k=16, 8

  33. Processing time profile

  34. Processing time profile

  35. Effect of K-means iterations 1 5 2 3 4 Bridge

  36. Effect of K-means iterations 1 3 2 5 4 Birch2

  37. How many swaps?

  38. Three types of swaps • Trial swap • Accepted swap • Successful swap MSE improves CI improves Before swap Accepted Successful CI=2 CI=2 CI=1 20.12 20.09 15.87

  39. Accepted and successful swaps

  40. Number of swaps neededExample with 35 clusters A3 CI=4 CI=9

  41. Number of swaps neededExample from image quantization

  42. Statistical observations N=5000, k=15, d=2, N/k=333, 4.1

  43. Statistical observations N=5000, k=15, d=2, N/k=333, 4.8

  44. Theoretical estimation

  45. Probability of good swap • Select a proper prototype to remove: • There are k clusters in total: premoval=1/k • Select a proper new location: • There are N choices: padd=1/N • Only k are significantly different: padd=1/k • Both happens same time: • k2significantly different swaps. • Probability of each different swap is pswap=1/k2 • Open question: how many of these are good? p = (/k)(/k) = O(/k)2

  46. Probability of not finding good swap: Expected number of iterations • Estimated number of iterations:

  47. Probability of failure (q) depending on T

  48. Observed probability (%) of fail N=5000, k=15, d=2, N/k=333, 4.5

  49. Observed probability (%) of fail N=1024, k=16, d=32-128, N/k=64, 1.1

  50. Bounds for the iterations Upper limit: Lower limit similarly; resulting in:

More Related