90 likes | 253 Views
KDD, pp. 895-903, ACM, 2011. Kernel k-means: not suitable for large data Two-step kernel k-means base-line Approximate kernel k-means proposal. Kernel k-means (remark rd).
E N D
KDD, pp. 895-903, ACM, 2011. Kernel k-means: not suitable for large data Two-step kernel k-means base-line Approximate kernel k-means proposal Coffee Talk
Kernel k-means (remark rd) ~ soft (fuzzy) k-means using Parzen density estimation for obtaining object weights (cluster assignment posteriors) n objects, d features, k clusters, l iterations Fuzzy k-means: memory : O(nd + nk) time : O(ndkl) Kernel k-means: memory : O(n2) time : O(n2kl) Coffee Talk
Two-step kernel k-means • Dataset with n points in d dimensions • k << m << n, used for sampling the dataset Step 1: cluster m points by kernel k-means Step 2: assign all n points to nearest mean Stop (no iterations) Memory: O(mn) Time: O(mnd + m2kl + mnk) Coffee Talk
Approximate Kernel k-means • Dataset with n points in d dimensions • k << m << n, used for sampling the dataset Step 1: Compute linear subspace in kernel space of m points Step 2: Use all n points to perform k-means in this space (l iterations) Memory: O(mn) Time: O(mnd+m3+m2n+mnkl), due to mxm matrix inverse Coffee Talk
20000 baseline Coffee Talk
Clustering error (MSE) reduction Coffee Talk
Normalized Mutual Information w.r.t the true class labels Coffee Talk