1 / 13

HE-Tree: a framework for detecting changes in clustering structure for categorical data streams

HE-Tree: a framework for detecting changes in clustering structure for categorical data streams. Keke Chen · Ling Liu VLDB, Vol.18, 2009, pp. 1241–1260 Presenter : Wei- Shen Tai 20 10 / 8/4. Outline . Introduction Entropy-based categorical clustering

ellery
Download Presentation

HE-Tree: a framework for detecting changes in clustering structure for categorical data streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HE-Tree: a framework for detecting changes in clustering structure for categorical data streams Keke Chen · Ling Liu VLDB, Vol.18, 2009, pp. 1241–1260 Presenter : Wei-Shen Tai 2010/8/4

  2. Outline • Introduction • Entropy-based categorical clustering • BKPlot for determining the “Best K” for categorical clustering • HE-Tree: capturing cluster entropy of the categorical data stream • A monitoring framework based on the HE-Tree • Experiments • Conclusion • Comments

  3. Motivation • Problems of clustering categorical data streams • None addressed the problems of monitoring the change of clustering structure in categorical data streams. • Most methods often assume a fixed number of clusters in the data stream.

  4. Objective • Hierarchical Entropy Tree structure (HE-Tree) • It captures the entropy characteristics of clusters in a data stream, and detects the change of Best K.

  5. Entropy-based categorical clustering • Classical entropy definition • Optimal partition, • Minimizing the weighted entropy of cluster Ck • Incremental entropy(IE) • After merging two clusters in a partition, the expected entropy should not be reduced. • Minimizing the expected entropy criterion in clustering

  6. BKPlot for determining the “Best K” for categorical clustering • BKPlot method • Determines the candidate best K for static datasets. • Investigates the entropy difference between any two optimal neighboring partitions. • Second-order difference • ACE (entropy-based agglomerative hierarchical clustering) • Generates such high-quality approximate BKPlots.

  7. ACE • IE (incremental entropy) • It is a natural inter-cluster similarity measure, ready for constructing a hierarchical clustering algorithm. • summary table • for conveniently counting occurrences of values • M-table • for bookkeeping M(Cp, Cq ) of any pair of clusters Cp and Cq. • M-heap • for maintaining the minimum M value in each step.

  8. HE-Tree: capturing cluster entropy of the categorical data stream • Find the most similar sub-tree to sample e • Growing stage • If M(e, ei) = 0 then e is merged to entry ei • Else • If leaf-node has empty entrythen e is assigned to an empty one • Else spilt leaf-node • Absorbing stage • e is merged to entry eiwith min M(e, ei)

  9. A monitoring framework based on the HE-Tree • Time-decaying HE-Tree • Let the decaying rate λ, 0 < λ < 1, represent the proportionof the information that is preserved from the last window. (record number, summary table and M-table) • Extended ACE • It takes sub-clusters as input andconsecutively merges the pairof clusters.

  10. Experiments - detecting changes

  11. Effect of the time-decaying HE-Tree

  12. Conclusion • HE-Tree • Detects the change of clustering structure in categorical data streams. • A time-decaying HE-tree makes the framework more sensitive to recently emerging clustering structures.

  13. Comments • Advantage • This proposed scheme provides a solution for detecting changes of categorical data streams. • This entropy-based HE-tree and its decaying ideas can be accepted intuitively . • Drawback • Due to summary table cannot handle mixed-type data in the same time, This proposed method only was applied to categorical data streams. • Is the decaying processes still necessary once the fixed-interval window is changed to a moving window? • Application • Categorical data stream clustering

More Related