120 likes | 207 Views
Network Informal Language & Status Translation on Social network. Aobo Wang. Introduction. What is NIL? Online Chat , BBS, ICQ, emails, Twittes and SMS Why NIL? Comprehension of customer dialogues. Information filtering Bridge language gaps among different online users. Agenda.
E N D
Network Informal Language & Status Translation on Social network Aobo Wang
Introduction • What is NIL? • Online Chat , BBS, ICQ, emails, Twittes and SMS • Why NIL? • Comprehension of customer dialogues. • Information filtering • Bridge language gaps among different online users
Agenda • State-of-the-fields on NIL • NIL Classification • Related Work on SMS • Conclusion • NIL Application • Translation • Crowdsoucing on Social network
State-of-the-fields on NIL • NIL Classification (Both in English and Chinese) • Abbreviation • “ASAP” refers to “as soon as possible” • “PL” refers to “漂亮” means “beautiful” • Emotional Chart • :p :) ==!!! • Homophony • “88” refers to “byebye”
State-of-the-fields on NIL • NIL Classification (English) • Deletion of Characters • “dont” refers to “do not”, “nxt” refers to “next” • Phonetic substitution • “4” refers to “for ” , “U” refers to “you” • Spelling error
State-of-the-fields on NIL • NIL Classification (Chinese) • Similar Pronunciation • “ 稀饭 “refers to “喜欢 ” which means “like” • Transliteration • “粉丝” refers to “fans” • Dialectal substitution • “神马” refers to “ 什么” which means “what” • Homophony • “推” refers to “publish a comment through Twitter” • Illegal word • “囧” means “embarrassed” • Pseudo polyphone • “拽” means “arrogant” • Other cases…
State-of-the-fields on NIL • Related Work on SMS (Latin languages) • Task : NIL terms Normalization
State-of-the-fields on NIL • Related Work on SMS (Chinese) • Tasks • NIL terms recognition • NIL terms normalization
State-of-the-fields on NIL • Conclusion • Statistical method performs better than Rule-based method . • Aligned parallel training data is very hard to obtain. • Chinese NIL terms are mostly created by phonetic transcription. • English NIL terms are mostly created by performing deletion and substitution to sub-string.
NIL Application • Text –to – Speech • Text mining applications(filtering, routing ,IR) • Translation • Is normalization indispensable? • NIL terms have less ambiguity • Alignment between NIL terms and translation is stable • Collect parallel training data directly from source language to target language
Crowdsoucing on Social network • Collect annotation from social network users • Collaborate translation • Status, comments • More than a task, less than a game • Translate • vote • Comment • Earn points , upgrade, rank and other incentive • Motivation • Self-motivated to learn a foreign language • Enjoy a “game” • Toy Demo
Thank you!Any comments are welcome. Aobo Wang