150 likes | 275 Views
Extending the Growing Hierarchal SOM for Clustering Documents in Graphs domain. Presenter : Cheng- Hui Chen Authors : Mahmoud F. Hussin , Mahmoud R. farra and Yasser El- Sonbaty IJCNN, 2008. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments.
E N D
Extending the Growing Hierarchal SOM for Clustering Documents in Graphs domain Presenter : Cheng-HuiChen Authors : Mahmoud F. Hussin, Mahmoud R. farra and Yasser El-Sonbaty IJCNN, 2008
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation river mild rafting • The variants of SOM are limited by the fact that they use only the VSM for document representations. • It does not represent any relation between the words. • The space complexityto the VSM. • The sentences being broken down into their individual components without any representation of the sentence structure.
river Objectives rafting mild Using graphs to represent documents helped the salient features of data through using edges to represent relations and using vertices to represent words. The decrease the space complexity comparing to the VSM. The extend the GHSOM to work in the graph domain to enhance the quality of clusters.
Enhance the DIG to work with G-GHSOM river 3 • The Document Index Graph (DIG) model • the DIG for representing the document and Exploited it in the document clustering. • For example (the document table of word "river" is shown) • River rafting. (doc1) • mild river rafting. (doc2) • River fishing. (doc3)
Enhance the DIG to work with G-GHSOM 1 2 2 1 3 3 4 4
Enhance the DIG to work with G-GHSOM • Single-word similarity measure • Two document vectors similarity measure • The total similarity is the integration
Ment of the G-GHSOM to work with graph • Neuron Initialization • Detecting the matching list to calculate the phrase based similarity.
Experiments • Data set • Reuter’s news articles (RNA) • University of Waterloo and Canadian Web sites (UW-CAN)
Conclusions • The extend the GHSOM to a new graph based GHSOM: (G-GHSOM) to enhance the quality of the document clustering. • G-GHSOM works successfully with graph domain and achieves a better quality clustering than TGHSOM in document clustering. • The enhanced the DIG model to work with GHSOM algorithm.
Comments • Advantages • Enhance the quality of the document clustering • Application • SOM • Clustering