180 likes | 354 Views
From Context to Distance-Learning Dissimilarity for Categorical Data Clustering. Presenter : Jian-Ren Chen Authors : DINO IENCO, RUGGERO G. PENSA, and ROSA MEO 2012 , ACMKDD. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.
E N D
From Context to Distance-Learning Dissimilarity for Categorical Data Clustering Presenter : Jian-Ren ChenAuthors : DINO IENCO, RUGGERO G. PENSA, and ROSA MEO 2012 , ACMKDD
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • Clustering data described by categorical attributesis a challenging task in data mining applications. • It is difficult to define a distance between pairs of values of a categorical attribute, since the values are not ordered.
Objectives • We present a new methodology to compute a context-based distancebetween values of a categorical variable. • - apply this technique to hierarchical clusteringof categorical data.
Methodology-Framework • DILCA (DIstance Learning for Categorical Attributes) • selection of a suitable context: • a parametric method • a fully automatic one • compute the distance between any pair of values of a specific categorical attribute
Conclusions • DILCA is competitive with respect to the stateof the art of categorical data clustering approaches. • DILCA is scalable and has alow impact on the overall computational time of a clustering task.
Comments • Advantages • scalable, computational time • Applications • a context-based distance between values of a categorical variable • hierarchical clustering of categorical data