220 likes | 427 Views
Multilingual document mining and navigation using self-organizing maps. Presenter : Yu-Ting LU Authors : Hsin-Chang Yang, Han-Wei Hsiao, Chung-Hong Lee 2011. IPM. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.
E N D
Multilingual document mining and navigation using self-organizing maps Presenter : Yu-Ting LUAuthors : Hsin-Chang Yang, Han-Wei Hsiao, Chung-Hong Lee2011. IPM
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • Such directories are generally constructed manually and may have disadvantages of narrow coverage and inconsistency. • Most of existing directories provide only monolingual hierarchies that organized Web pages in terms that a user may not be familiar with.
Objectives • This work will propose an approach that could automatically arrange multilingual Web pages into a multilingual Web directory to break the language barriers in Web navigation.
Methodology – Web directory generation • Web page preprocessing and encoding • English • Word segmentation • stop-word elimination • Stemming • keyword selection • Chinese • select only nouns as keywords
Methodology – Web directory generation • Feature map generation
Methodology – Web directory generation • Super cluster construction • Determining dominating clusters • Constructing hierarchy • Parameter setting and discussions • Web directory generation • Super cluster construction • Determining dominating clusters • Constructing hierarchy • Parameter setting and discussions
Methodology – Web directory generation • Evaluation of the quality of generated hierarchies
Methodology – Multilingual Web directory generation • Alignment of monolingual Web directories • Calculating semantic similarity • Incorporating structural similarity • Overall similarity
Methodology – Multilingual Web directory generation • Alignment of monolingual Web directories
Methodology – Multilingual Web directory generation • Multilingual Web directory generation
Experiments- Hierarchy alignment and Web directory generation
Conclusions • The development of multilingual hierarchy alignment method is fully automated and requiresno human intervention. • It will be convenient for users to have a Web directory providing multilingual category labels and categorizing multilingual Web pages.
Comments • Advantages • The development of multilingual hierarchy alignment method • Fully automated • Applications - SOM