1 / 12

SCALE: a scalable framework for efficiently clustering transactional data

SCALE: a scalable framework for efficiently clustering transactional data. Hua Yan · Keke Chen · Ling Liu · Zhang Yi DMKD 2010 Reported by Wen-Chung Liao, 2010/03/02. Outlines. Motivation Objective WCD clustering Evaluating clustering results Experiments Conclusions Comments.

hija
Download Presentation

SCALE: a scalable framework for efficiently clustering transactional data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SCALE: a scalable framework for efficiently clustering transactional data Hua Yan · Keke Chen · Ling Liu · Zhang Yi DMKD 2010 Reported by Wen-Chung Liao, 2010/03/02

  2. Outlines • Motivation • Objective • WCD clustering • Evaluating clustering results • Experiments • Conclusions • Comments

  3. Motivation • transactional data clustering algorithms require users to manually tune at least one or two parameters • lacks of cluster validation methods to evaluate the quality of transactional clustering results.

  4. Objectives • Present a fast, memory-saving, and scalable clustering algorithm that can efficiently handle large transactional datasets without resorting to manual parameter settings. • SCALE framework

  5. WCD clustering • transactional dataset • {abcd, bcd, ac, de, def}

  6. Evaluating clustering results

  7. T10I4Dx Experiment • two synthetic datasets: • Tc30a6r1000_2L • TxI4Dx Series T10I4Dx TxI4D100k • Three real datasets: • Zoo • Mushroom • Retail

  8. Tc30a6r1000_2L Zoo

  9. Conclusion • Two unique features of SCALE • the WCD clustering algorithm—a fast, memory-saving and scalable method for clustering transactional data, • two transactional data specific cluster evaluation measures: LISR and AMI. • Some promising directions • perform some experimental comparison between the WCD measure and the entropy measure. • design a better algorithm for determining the best K for transactional data clustering. • Extend our work to handle transactional data streams

  10. Comments • Advantage • No parameter setting required • Shortage • If there is no BKPlot, WCD needs to determine K manually. • No description of how BKPlot generates K in categorical case. • Applications • Transactions clustering • Web log clustering • …

More Related