Boosting Algorithm for Clustering

Boosting Algorithm for Clustering Kuan-ming Lin Dec. 1, 2005

Agenda • Motivation • Changes from Adaboost to boost-clustering • The boost-clustering algorithm • Examples • Discussions

Motivation: want to improve “weak clustering algorithms” • Most clustering algorithms favor certain shapes • e.g. k-means performs well for spherical shape • We want to generalize them to fit more complex shapes • Though lack labels, some instances seem harder to be clustered than others • Learning from these instances might reinforce the clustering algorithm

Boost-clustering: counterpart of Adaboost, but more issues… • Essence in Adaboost • Identify instances not learned well • Add weights to them, re-learn the weighted input • Output a combination of all learners • Analogy in boost-clustering • Identify instances not clustered well • how to define “well”? • Add weights to them, re-run clustering algorithm • Output a combination of all clustering results • clusters not consistently labeled -- how to combine different results?

A solution • In Pattern Recognition Letter ‘04 • Frossyniotis, Likas, Stafylopatis. “A clustering method based on boosting” • Fix the number of clusters to facilitate cluster combination • Need a soft clustering where membership degrees of each instance to all clusters are generated • Define pseudoloss to measure wellness of clusters • Cannot proof effectiveness mathematically • No such “error bound” for clustering problem

The algorithm Initialize weight wi = 1/N for all instances i For t = 1 to T do Let x’ = bootstrap according to probability w Call WeakCluster(x’) to generate cluster set C Get membership degree hi, c for all (i, C) Renumber cluster index -- by fractions of shared instances with old clusters

The algorithm continued Define pseudoloss pi for each instance • Minmax: 1-maxChi,c+minChi,c • Entropy: –ΣC(hi,c log hi,c) Set ε= Σi(wi* pi)/2, α=log((1-ε)/ε) Set new weight wt+1 = norm.(wt*exp(-α*p))) Set new ht+1 = Σt(αt ht) Return clusters according to hT

Example: the effect of weighting (1) iteration=0 iteration=1

Example: the effect of weighting (2) iteration=5 iteration=15 (stopping criteria) • Problem: boundary instances form a cluster

Banana example • Can loosen shape restriction of k-means • Not quite: here four clusters needed k-means k-means + boost-clustering

Discussions • Performance only assessed by experiments • For easy cases, not much improvement • Could be worse due to overemphasis on boundary • Some benefits for irregular shapes • Can build more complex clusters • For really hard cases, still limited by the nature of clustering algorithm • e. g. can’t make k-means learn concentric circles

Thank you http://www.lvcandymania.com/

Boosting Algorithm for Clustering

Boosting Algorithm for Clustering

Presentation Transcript

Local Clustering Algorithm

BSP Clustering Algorithm for Social Network Analysis

Linear Clustering Algorithm

Running Clustering Algorithm in Weka

Neuronal Recording Based Clustering Algorithm

SCAN : A Structural Clustering Algorithm for Networks

HCS Clustering Algorithm

CURE: Clustering Using REpresentatives algorithm

A novel genetic algorithm for automatic clustering

Support Vector Clustering Algorithm

Clustering Algorithm

DOCUMENT CLUSTERING USING HIERARCHICAL ALGORITHM

APSCAN: A parameter free algorithm for clustering

SCAN: A Structural Clustering Algorithm for Networks

Algorithm design for MAPS clustering

A New Gravitational Clustering Algorithm

Local Clustering Algorithm

A novel genetic algorithm for automatic clustering

SCAN: A Structural Clustering Algorithm for Networks

Towards a clustering algorithm for CALICE

Categorical K-means Clustering Algorithm