80 likes | 213 Views
Spl chkng cmc txt (Spell Checking CMC text). Christopher Johnson. Introduction. What is Computer Mediated Communication (CMC)? Short Message Service Blogs (Twitter) E-mail Instant messages Observed language during such communication Lo (Microsoft Messenger)
E N D
Splchkngcmctxt(Spell Checking CMC text) Christopher Johnson
Introduction • What is Computer Mediated Communication (CMC)? • Short Message Service • Blogs (Twitter) • E-mail • Instant messages • Observed language during such communication • Lo (Microsoft Messenger) • Happy bdayhpe u hv a gd day x (SMS) • Awe! Ur so welcome! Sorry I was so sleepy! Lol (Twitter)
What is the problem? • Most people are in contact with some form of CMC • Children • Adults • People can hide behind any persona they create for themselves • For example Paedophiles • Lure children by pretending to be other children
How can we solve it? • Man reading every message? No • Would this suffice anyway? • Autonomous processing of messages? Yes • Well at least the most appropriate way.
How can we do that? • We need an understanding of the messages • SMS • Blogs • E-mails • And others • We know that abbreviations are used • But how can we expand these abbreviations back to standard text? • What about misspellings • How do we get a large real world corpus to train and test on?
What tools already exist? • VARD • NLP techniques • N-grams (This project will use Bigrams and Trigrams) • Phonetic algorithms • Soundex • Metaphone • These tools are commonly used for spell checkers • But how well do these apply to CMC?
Proposed Plan • Research into current techniques which could be applicable • Create a large corpus of CMC text • Improve techniques for very similar languages • (English CMC and CMC) • Create a system which can distinguish between CMC text and unabridged text • Test the systems success rate. • Convert CMC to unabridged text • (Ambitious, therefore only if time)
References • The Real World - National Education Association Health Information Network • http://bnetsavvy.org/wp/a-teen-talks-about-texting-and-what-parentseducators-need-to-know-about-it/ • About VARD 2 – Baron, Alistair • http://www.comp.lancs.ac.uk/~barona/vard2/ • Lawrence Philips' Metaphone Algorithm - Atkinson, Kevin • http://aspell.net/metaphone/ • The Soundex Indexing System – The National Archives • http://www.archives.gov/publications/general-info-leaflets/55.html