130 likes | 141 Views
KwangHee Park 02/17. Link Distribution on Multilingual Wikipedia. Introduction. Current Problem Analyze Link Distribution on Multilingual Wikipedia Goal Find Cultural Intention from Multilingual data for the Multilingual Synchronization. Example. Samsung. Methodology. Topic modeling
E N D
KwangHee Park 02/17 Link Distribution on Multilingual Wikipedia
Introduction • Current Problem • Analyze Link Distribution on Multilingual Wikipedia • Goal • Find Cultural Intention from Multilingual data for the Multilingual Synchronization
Example • Samsung
Methodology • Topic modeling • Target = 5 linked article • 34,577 number of article from each language • English, Espanol, French, Chinese, Korean • Linked term • Easy to handling in terms of Term boundary recognition problem
LDA approach Korean Wiki page Inter language link English Wiki page
Experiment • LingPipe API • Support LDA cluster • 20 number of topics • Linked term • English : random sample about #330,000 • Korean : about 220,000 • Document • English : 1000 number of article • Korean : 3185 number of article
Problem • Select total topic number • Topic number per document • Need to some threshold • Evaluation
Evaluation • Count Overlapping Terms in Topic and in Session • Limit 3 topics per document • Labeling to all topics and judge manually
Work plan • Experiment • Apply other language • French , Chinese, Espanol • Compare with old document • Analyze Latent changes