1 / 18

From Context to Distance-Learning Dissimilarity for Categorical Data Clustering

From Context to Distance-Learning Dissimilarity for Categorical Data Clustering. Presenter : Jian-Ren Chen Authors : DINO IENCO, RUGGERO G. PENSA, and ROSA MEO 2012 , ACMKDD. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.

alamea
Download Presentation

From Context to Distance-Learning Dissimilarity for Categorical Data Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Context to Distance-Learning Dissimilarity for Categorical Data Clustering Presenter : Jian-Ren ChenAuthors : DINO IENCO, RUGGERO G. PENSA, and ROSA MEO 2012 , ACMKDD

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

  3. Motivation • Clustering data described by categorical attributesis a challenging task in data mining applications. • It is difficult to define a distance between pairs of values of a categorical attribute, since the values are not ordered.

  4. Objectives • We present a new methodology to compute a context-based distancebetween values of a categorical variable. • - apply this technique to hierarchical clusteringof categorical data.

  5. Methodology-Framework • DILCA (DIstance Learning for Categorical Attributes) • selection of a suitable context: • a parametric method • a fully automatic one • compute the distance between any pair of values of a specific categorical attribute

  6. Methodology - Context Selection

  7. Methodology- Context Selection

  8. Methodology - Context Selection

  9. Methodology -Distance Computation

  10. Experiments - Datasets

  11. Experiments-Purity、NMI、ARI

  12. Experiments -Purity、NMI、ARI

  13. Experiments-Purity、NMI、ARI

  14. Experiments -Impact of σ on DILCAM

  15. Experiments -Impact of σ on DILCAM

  16. Experiments-Scalability

  17. Conclusions • DILCA is competitive with respect to the stateof the art of categorical data clustering approaches. • DILCA is scalable and has alow impact on the overall computational time of a clustering task.

  18. Comments • Advantages • scalable, computational time • Applications • a context-based distance between values of a categorical variable • hierarchical clustering of categorical data

More Related