120 likes | 273 Views
SCALE: a scalable framework for efficiently clustering transactional data. Hua Yan · Keke Chen · Ling Liu · Zhang Yi DMKD 2010 Reported by Wen-Chung Liao, 2010/03/02. Outlines. Motivation Objective WCD clustering Evaluating clustering results Experiments Conclusions Comments.
E N D
SCALE: a scalable framework for efficiently clustering transactional data Hua Yan · Keke Chen · Ling Liu · Zhang Yi DMKD 2010 Reported by Wen-Chung Liao, 2010/03/02
Outlines • Motivation • Objective • WCD clustering • Evaluating clustering results • Experiments • Conclusions • Comments
Motivation • transactional data clustering algorithms require users to manually tune at least one or two parameters • lacks of cluster validation methods to evaluate the quality of transactional clustering results.
Objectives • Present a fast, memory-saving, and scalable clustering algorithm that can efficiently handle large transactional datasets without resorting to manual parameter settings. • SCALE framework
WCD clustering • transactional dataset • {abcd, bcd, ac, de, def}
T10I4Dx Experiment • two synthetic datasets: • Tc30a6r1000_2L • TxI4Dx Series T10I4Dx TxI4D100k • Three real datasets: • Zoo • Mushroom • Retail
Tc30a6r1000_2L Zoo
Conclusion • Two unique features of SCALE • the WCD clustering algorithm—a fast, memory-saving and scalable method for clustering transactional data, • two transactional data specific cluster evaluation measures: LISR and AMI. • Some promising directions • perform some experimental comparison between the WCD measure and the entropy measure. • design a better algorithm for determining the best K for transactional data clustering. • Extend our work to handle transactional data streams
Comments • Advantage • No parameter setting required • Shortage • If there is no BKPlot, WCD needs to determine K manually. • No description of how BKPlot generates K in categorical case. • Applications • Transactions clustering • Web log clustering • …