230 likes | 334 Views
Learning multiple nonredundant clusterings. Presenter : Wei- Hao Huang Authors : Ying Gui , Xiaoli Z. Fern, Jennifer G. DY TKDD, 2010. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.
E N D
Learning multiple nonredundantclusterings Presenter : Wei-Hao Huang Authors : Ying Gui, Xiaoli Z. Fern, Jennifer G. DY TKDD, 2010
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • Data exist multiple groupings that are reasonable and interesting from different perspectives. • Traditional clustering is restricted to finding only one single clustering.
Objectives • To propose a new clustering paradigm for finding all non-redundant clustering solutions of the data.
Methodology • Orthogonal clustering • Cluster space • Clustering in orthogonal subspaces • Feature space • Automatically Finding the number of clusters • Stopping criteria
Orthogonal Clustering Framework X (Face dataset)
Orthogonal clustering ) Residue space
Clustering in orthogonal subspaces Projection Y=ATX • Feature space • linear discriminant analysis (LDA) • singular value decomposition (SVD) • LDA v.s. SVD • where
Clustering in orthogonal subspaces A(t)= eigenvectors of Residue space
Compare moethod1 and mothod2 A(t)= eigenvectors of M’=M then P1=P2 • Residue space • Moethod1 • Moethod2 • Moethod1 is a special case of Moethod2.
Experiments • To use PCA to reduce dimensional • Clustering • K-means clustering • Smallest SSE • Gaussian mixture model clustering (GMM) • Largest maximum likelihood • Dataset • Synthetic • Real-world • Face, WebKB text, Vowel phoneme, Digit
Experiments Evaluation
Experiments Synthetic
Experiments Face dataset
Experiments WebKB dataset Vowe phoneme dataset
Experiments Digit dataset
Experiments • Finding the number of clusters • K-means Gap statistics
Experiments • Finding the number of clusters • GMMBIC • Stopping Criteria • SSE is less than 10% at first iteration • Kopt=1 • Kopt> Kmax Select Kmax • Gap statistics • BIC Maximize value of BIC
Experiments Synthetic dataset
Experiments Face dataset
Experiments WebKB dataset
Conclusions • To discover varied interesting and meaningful clustering solutions. • Method2 is able to apply any clustering and dimensionality reduction algorithm.
Comments • Advantages • Find Multiple non-redundant clustering solutions • Applications • Data Clustering