130 likes | 177 Views
Boosting Algorithm for Clustering. Kuan-ming Lin Dec. 1, 2005. Agenda. Motivation Changes from Adaboost to boost-clustering The boost-clustering algorithm Examples Discussions. Motivation: want to improve “ weak clustering algorithms ”. Most clustering algorithms favor certain shapes
E N D
Boosting Algorithm for Clustering Kuan-ming Lin Dec. 1, 2005
Agenda • Motivation • Changes from Adaboost to boost-clustering • The boost-clustering algorithm • Examples • Discussions
Motivation: want to improve “weak clustering algorithms” • Most clustering algorithms favor certain shapes • e.g. k-means performs well for spherical shape • We want to generalize them to fit more complex shapes • Though lack labels, some instances seem harder to be clustered than others • Learning from these instances might reinforce the clustering algorithm
Boost-clustering: counterpart of Adaboost, but more issues… • Essence in Adaboost • Identify instances not learned well • Add weights to them, re-learn the weighted input • Output a combination of all learners • Analogy in boost-clustering • Identify instances not clustered well • how to define “well”? • Add weights to them, re-run clustering algorithm • Output a combination of all clustering results • clusters not consistently labeled -- how to combine different results?
A solution • In Pattern Recognition Letter ‘04 • Frossyniotis, Likas, Stafylopatis. “A clustering method based on boosting” • Fix the number of clusters to facilitate cluster combination • Need a soft clustering where membership degrees of each instance to all clusters are generated • Define pseudoloss to measure wellness of clusters • Cannot proof effectiveness mathematically • No such “error bound” for clustering problem
The algorithm Initialize weight wi = 1/N for all instances i For t = 1 to T do Let x’ = bootstrap according to probability w Call WeakCluster(x’) to generate cluster set C Get membership degree hi, c for all (i, C) Renumber cluster index -- by fractions of shared instances with old clusters
The algorithm continued Define pseudoloss pi for each instance • Minmax: 1-maxChi,c+minChi,c • Entropy: –ΣC(hi,c log hi,c) Set ε= Σi(wi* pi)/2, α=log((1-ε)/ε) Set new weight wt+1 = norm.(wt*exp(-α*p))) Set new ht+1 = Σt(αt ht) Return clusters according to hT
Example: the effect of weighting (1) iteration=0 iteration=1
Example: the effect of weighting (2) iteration=5 iteration=15 (stopping criteria) • Problem: boundary instances form a cluster
Banana example • Can loosen shape restriction of k-means • Not quite: here four clusters needed k-means k-means + boost-clustering
Discussions • Performance only assessed by experiments • For easy cases, not much improvement • Could be worse due to overemphasis on boundary • Some benefits for irregular shapes • Can build more complex clusters • For really hard cases, still limited by the nature of clustering algorithm • e. g. can’t make k-means learn concentric circles
Thank you http://www.lvcandymania.com/