1 / 52

Language Change as a Constrained Multi-Objective Optimization

Indo-Australia Workshop on Optimization in Human Language Technology 16 th Dec 2012, IIT Patna. Language Change as a Constrained Multi-Objective Optimization. Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com. A tale of the lazy tongue. Language Change.

alvis
Download Presentation

Language Change as a Constrained Multi-Objective Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indo-Australia Workshop on Optimization in Human Language Technology 16th Dec 2012, IIT Patna Language Changeas a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy tongue

  2. Language Change

  3. Language Change • Change in the syntactic/semantic/phonological features of a language • Perpetual, universal, directional (?) • Phonological Change: • Affects the sounds • Structured, independent of syntax/semantics • Example: Loss of consonant clusters in Hindi agni aag, dugdha  dUdh, raatri  raat

  4. Effects of the “Lazy Tongue” Assimilation • in+apt = inapt • in+decent = indecent • in+polite = impolite • in+mature = immature • in+legal = illegal • in+regular = irregular Deletion • cannot  can’t • do not  don’t • will not  won’t • are not  ain’t • information  info

  5. Explanations for Change Exogenous causes • Language contact • Socio-political factors • Communication medium Endogenous causes • Functional • Phonetic error-based • Frequency drifts • Evolutionary

  6. Functional Explanation of Language Change • There are three evolutionary forces on any linguistic system: • Minimization of effort (energy) • Maximization of perceptual distinctiveness (Minimization of ambiguity) • Maximization of learnability Language is a perpetually evolving system shaped by these three conflicting forces

  7. Outline of the Talk • Morpho-phonological change of Bangla Verb systems and emergence of dialect diversity • Approach: Multi-Objective Constrained Optimization • Technique: Multi-Objective Genetic Algorithm (MOGA) • Understanding Computer Mediated Communication • Normalization of Texting language • Romanization of Indian Language text

  8. Standard Colloquial Bengali (SCB) Agartala Colloquial Bengali (ACB) Sylhetti Geography of Bangla

  9. History of Bangla 1200 AD 1800 AD

  10. BanglaVerbMorphology করেছিলাম kar-echh-il-aam Verb root (do) Aspect (perfect) Tense (past) Person (first) I had done

  11. Cognates in the Dialects root: kar (to do)

  12. Atomic Phonological Operators Deletion, Metathesis Assimilation, Mutation kariteChila Del(e/t_Ch) karitChila kariChila Del(t/_Ch) Met(ri/_Ch) kairChila korChila Asm(ao/_i) Mut(a o/_$) korChilo

  13. Hypothesis A sequence of Atomic Phonological Operators, is preferred if the verb forms obtained by application of this sequence on the classical forms have some functional benefit over the classical forms. Thus, all the modern dialects of Bangla have some functional advantage over the classical dialect.

  14. f1: Effort of articulation f2: [Acoustic distinctiveness]-1 A Formal Model of Functional Explanation Unstable languages Metastable languages Impossible languages

  15. Genetic Algorithm Gene (A string of symbols) How the solution actually looks like GA: search for good solutions mimicking nature [recombination and mutation of genes]

  16. Phenotype Lexicon consisting of 28 forms for the verb kar

  17. Genotype A sequence of atomic phonological operators

  18. Genotype  Phenotype

  19. Crossover

  20. Mutation

  21. Multi-Objective GA

  22. Multi-Objective GA: Apply constraints

  23. Multi-Objective GA: Apply constraints

  24. Multi-Objective GA: Finding out good solutions

  25. Multi-Objective GA: But also keep some not-so-good solutions

  26. Multi-Objective GA: But also keep some not-so-good solutions

  27. Multi-Objective GA: After several iterations

  28. Objective functions • Articulatory effort • fe(Λ): weighted sum of number of syllables, letters and vowel height differences averaged over all words in the lexicon • Acoustic Distinctiveness • fd(Λ): Inverse of mean edit distance between words • Learnability • fr(Λ): correlation between feature match and edit distance

  29. Experiments • NSGA – II : a package for fast MOGA • Gene length: 15 APOs • A repertoire of 128 APOs • Population: 1000, Generation: 500 • 6 Models with different combinations of constraints and objectives

  30. Pareto-optimal front SCB Sylhetti ACB CB

  31. Observations • vertical and horizontal limb • real dialects on the horizontal limb • Sound changes push the dialects from right to left (reduce effort) • but never up the limb • why?

  32. Role of Constraints

  33. For more information Choudhury et al., Evolution optimization and language change: the case of Bengali verb inflections, in Proceedings of ACL SIGMORPHON9, Association for Computational Linguistics, 2007 http://research.microsoft.com/people/monojitc/ MOGA and NSGA II Kanpur Genetic Algorithms Laboratory http://www.iitk.ac.in/kangal/index.shtml

  34. Food for Thought • Evaluation: • Myriads of possible dialects, but only a few observed in nature • Fixed set of pre-defined APOs – how to generalize for any change? • MOGA is an optimization tool, which in no way simulates language change • How do languages optimize themselves?

  35. Outline of the Talk • Morpho-phonological change of Bangla Verb systems and emergence of dialect diversity • Approach: Multi-Objective Constrained Optimization • Technique: Multi-Objective Genetic Algorithm (MOGA) • Understanding Computer Mediated Communication • Normalization of Texting language • Romanization of Indian Language text

  36. Computer Mediated Communication Form

  37. Texting Language • A new genre of English & also other languages used in chats, sms, emails, blogs, tweets, FB posts, comments etc. dis is n eg 4 txtinlang This is an example for Texting language

  38. Texting Language The shorter  the faster Constraint: understandability • A new genre of English & also other languages used in chats, sms, emails, blogs, etc. • Ungrammatical, unconventional spellings dis is n eg 4 txtin lang This is an example for Texting language 24 39

  39. Analysis of Social Media • A hot topic in NLP • Normalization • Language identification • Sentiment/Polarity detection • Summarization/trend prediction Choudhury et al. (2007) Investigation and Modeling of the Structure of Texting Language. In IJCAI Workshopon Analytics of Noisy Data 2007

  40. 2moro (9) tomoz (25) tomoro (12) tomrw (5) tom (2) tomra (2) tomorrow (24) tomora (4) tomm (1) tomo (3) tomorow (3) 2mro (2) morrow (1) tomor (2) tmorro (1) moro (1) Tomorrow never dies!!!

  41. Patterns or Compression Operators • Phonetic substitution (phoneme) • psycho  syco, then  den • Phonetic substitution (syllable) • today  2day , see  c • Deletion of vowels • message  mssg, about  abt • Deletion of repeated characters • tomorrow  tomorow

  42. Patterns or Compression Operators • Truncation (deletion of tails) • introduction  intro, evaluation  eval • Common Abbreviations • Bangalore  blr, text back  tb • Informal pronunciation • going to  gonna, better  betta

  43. HMMs for SMS Normalization ε D @ ε A @ ε Y @ ε T @ ε O @ G3 ‘D’ G4 ‘A’ G5 ‘Y’ G1 ‘T’ G2 ‘O’ S0 P4 /AY/ S6 P2 /AH/ S1 “2”

  44. Bigram Examples • TL:would b gd 2 c u some time soon • Op: would be good to see you some time soon • TL:just wanted 2 say a big thanx 4 my bday card • Op: just wanted to say a big thanks for my today card • TL:me wel i fink bein at home makes me feel a lot more stressed den bein away from it • Op: me well i think being at home makes me feel a lot more stressed deny being away from it

  45. Use of Indian Languages on Online Social Media Transliteration Spelling Change Code mixing Indian English

  46. Concluding Remarks • Languages are perpetually evolving and optimizing systems • Computational modeling of language change is still in its infancy • Lots of scope for research

  47. Thank You!monojitc@microsoft.comQuestions??

  48. Why Computational Models? Exploration Toy languages Virtual experimentation Simplified assumptions Formalization Intractable FOR AGAINST Can we model real world language change?

  49. Objectives and Constraints - 1 • Articulatory effort fe(w) = α1fe1(w) + α2fe2(w) + α3fe3(w) fe1(w) = |w| fe2(w) = hr(σi) fe3(w) =  |ht(Vi) - ht(Vi+1)|

  50. Objectives and Constraints - 2 • Acoustic distinctiveness fd(Λ) = (1/N) ed(wi,wj)-1 Cd(Λ) = -1 if ed(wi,wj) = 0 for > 2 pairs • Phonotactic constraints Cp(Λ) = -1 if any of the words violate the phonotactic constraints of the language

More Related