120 likes | 243 Views
Multilingual document mining and navigation using self-organizing maps. Presenter : Keng -Yu Lin Author : Hsin -Chang Yang , Han-Wei Hsiao , Chung-Hong Lee IPM .2011. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.
E N D
Multilingual document mining and navigation using self-organizing maps Presenter : Keng-Yu Lin Author : Hsin-Chang Yang , Han-Wei Hsiao , Chung-Hong Lee IPM .2011
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation Monolingual interface may limit the spread of users who unfamiliar with the language.
Objectives • To propose an approach that could automatically arrange multilingual Web pages into a multilingual Web directory to break the language barriers in Web navigation.
Methodology • Preprocessing • Word segmentation • Stopword elimination • Stemming • Keyword selection • Encoding • All keywords of all documents are collected to build a vocabulary VE. • A document is encoded into a binary vector according to those keywords that occurred in it. Ex: Xi=[0,1,1,0,1,0,1,1]
Methodology => document cluster map (DCM) => keyword cluster map (KCM) • SOM Algorithm
Methodology Determining dominating clusters algorithm
Methodology (C1,C3)=4 (C3,C5)=3 (C1,C5)=3 PK=(4+3+3)/3=3.33 Evaluation of quality of generated hierarchies
Methodology • Multilingual web directory generation • Semantic similarity • Structural similarity
Conclusions The approach is fully automated and requires no human intervention. The result of the alignment can be applied to tackle tasks such as multilingual information retrieval.
Comments • Advantage • The research result can help people to break language barrier. • Applications • Multilingual information retrieval.