160 likes | 270 Views
Automatically Determining the Number of Clusters in Unlabeled Data Sets. Presenter : Lin, Shu -Han Authors : Liang Wang, Christopher Leckie , Kotagiri Ramamohanarao , and James Bezdek. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING (TKD), 2009. Outline. Motivation Objective
E N D
Automatically Determining the Number ofClusters in Unlabeled Data Sets Presenter : Lin, Shu-Han Authors : Liang Wang, Christopher Leckie, KotagiriRamamohanarao, and James Bezdek IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING(TKD), 2009
Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments
Motivation “reordered dissimilarity image” (RDI) Howtoautomaticallyestimatethenumberofclustersinunlabeleddataset?
Objectives ExtractDarkBlock 4
Methodology– VAT VAT 5
Methodology– VAT VAT 6
Methodology– DBE 1 2 3 4 7
Methodology– DBE1.Dissimilaritytransformationandimagesegmentation f(t) Graythreshfunction(Matlab):σ 8 after before
Methodology– DBE2. Directionalmorphologicalfilteringofthebinaryimage a=2% a=1% Symmetric: along horizontal and vertical directions Linear: along the same direction 9
Methodology– DBE3. Distancetransformanddiagonalprojectionoffilteredimage Nearest non-zero pixel 10
Methodology– DBE4. Detection of major peaks and valleys in the projectionsignal Smooth(parameter:a) Major“peaks/valleys”(parameter:a) 11
Experiments – ComparewithCCE Syntheticdatasets Realdatasets 14
Conclusions • The most method prefer “larger” rather than “smaller” clusters • The DBE • (Nearly) Automatically estimating the number of clusters • Just one easy-to-set parameter: a
Comments • Advantage • An visual assessment of cluster tendency (VAT) • Combine the cluster analysis problem with the image processing tech. • Drawback • … • Application • …