130 likes | 255 Views
Determining the best K for clustering transactional datasets – A coverage density-based approach. Presenter : Lin, Shu -Han Authors : Hua Yan, Keke Chen, Ling Liu, Joonsoo Bae. Data & Knowledge Engineering (DKE) 68 (2009) 28–48. Outline. Motivation Objective Methodology
E N D
Determining the best K for clustering transactional datasets–A coverage density-based approach Presenter : Lin, Shu-Han Authors : Hua Yan, Keke Chen, Ling Liu, JoonsooBae Data & Knowledge Engineering(DKE) 68 (2009) 28–48
Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments
Motivation Booleanvalues Clusterthetransactional datasets– akindofspecialcategoricaldata Timecomplexity:O(dmN2logN) 3
Objectives • TodesignamethodACTD(AgglomerativeClusteringalgorithmwithTransactional-cluster-modesDissimilarity)especiallyfortransactionaldata • Instead of ACE (Agglomerative Categorical clustering with Entropy criterion) • Findbest-K • Moreefficiently 4
Methodology–OverviewofSCALE (Sampling,ClusteringstructureAssessment,cLustering&domain-specficEvaluation) Agglomerative ACE ACTD BKPlot DMDI 5
Methodology– ACTDIntra-cluster similarity Nk Mk in this case, only c is the transactional-cluster-mode • Coverage Density • Transactional-cluster-mode • A subset of items 6
Methodology– ACTDInter-cluster similarity • [0, .5] Transactional-cluster-mode dissimilarity Timecomplexity:O(dmN2logN) O(MN2logN) 7
Methodology– DMDI Valleys、 changedramatically 8
Experiments – Quality on sample dataset Withnoise 11
Conclusions TheACTD • TheCoverageDensity-basedmethodispromisingfortransactionaldatasets • Faster • Morestable thanentropy-basedmethod • TheAgglomerativeHierarchicalclusteringalgorithmandDMDIcanhelptofindbest-K
Comments • Advantage • … • Drawback • … • Application • …