200 likes | 315 Views
Dynamic hierarchical algorithms for document clustering. Presenter : Wei- Hao Huang Authors : Reynaldo Gil- García , Aurora Pons- Porrata PRL, 2010. Outlines. Motivation Objectives Hierarchical clustering Methodology Experiments Conclusions Comments. Motivation.
E N D
Dynamic hierarchical algorithms for document clustering Presenter : Wei-Hao Huang Authors : Reynaldo Gil-García, Aurora Pons-Porrata PRL, 2010
Outlines • Motivation • Objectives • Hierarchical clustering • Methodology • Experiments • Conclusions • Comments
Motivation • The World Wide Web and the number of text documentsmanaged in organizational intranets continue to grow at an amazing speed. • In dynamic information environments is usually desirable to apply adaptive methods for document organization such as clustering.
Objectives • Static clustering methods mainly rely on having the whole collection ready before applying the algorithm. • dynamic algorithms able to update the clustering without perform complete reclustering. • Independent on the data order.
Hierarchical clustering Agglomerative and divisive Provide data-views at different levels
Methodology • Dynamic hierarchical agglomerative framework • Specific algorithm: • Dynamic hierarchical compact (DHC) • Create disjoint hierarchies of clusters • Dynamic hierarchical star (DHS) • Produce overlapped hierarchies
Dynamic hierarchical agglomerative framework j i β-similarity β is minimum similarity threshold i is a β-isolated cluster if its similarity with all clusters < β i is β-similarity j, if their similarity >= β
Experiments Using 15 benchmark text collection. Clustering quality Sensitivity to parameters Balance Efficiency
Conclusions • Methods are suitable for producing hierarchical clustering solutions in dynamic environments effectively and efficiently. • Better balance between depth and width. • Offer hierarchies easier to browse than traditional algorithms.
Comments • Advantages • Deal with dynamic data sets. • Effectiveness and the efficiency of the clustering. • Applications • Hierarchical clustering