1 / 72

Dual Decomposition Inference for Graphical Models over Strings

A general method for inferring strings from other strings using graphical models. Try it if you haven't observed all the words of a noisy or complex language.

brownjanice
Download Presentation

Dual Decomposition Inference for Graphical Models over Strings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dual Decomposition Inference for Graphical Models over Strings Nanyun (Violet) Peng Ryan Cotterell Jason Eisner JohnsHopkinsUniversity

  2. Attention! • Don’t care about phonology? • Listen anyway. This is a general method for inferring strings from other strings(if you have a probability model). • So if you haven’t yet observed all the words of your noisy or complex language, try it!

  3. A Phonological Exercise Tenses 1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part. TALK [tɔk] [tɔks] [tɔkt] [tɔkt] THANK [θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt] [hæk] [hæks] [hækt] HACK [hækt] [kɹæks] [kɹækt] CRACK Verbs [slæp] [slæpt] SLAP

  4. Matrix Completion: Collaborative Filtering Movies -37 29 29 19 -36 67 22 77 -24 61 12 74 -79 -41 Users -52 -39

  5. Matrix Completion: Collaborative Filtering Movies [9,-2,1] [9,-7,2] [4,3,-2] • [-6,-3,2] -37 29 29 19 [4,1,-5] -36 67 22 77 [7,-2,0] -24 61 12 74 [6,-2,3] -79 -41 [-9,1,4] Users -52 -39 [3,8,-5]

  6. Matrix Completion: Collaborative Filtering Movies [9,-2,1] [9,-7,2] [4,3,-2] • [-6,-3,2] -37 29 29 19 [4,1,-5] -36 67 22 77 [7,-2,0] -24 61 12 74 [6,-2,3] 59 -79 -41 -80 [-9,1,4] Users -52 6 46 -39 [3,8,-5] Prediction!

  7. Matrix Completion: Collaborative Filtering [1,-4,3] [-5,2,1] Dot Product -10 Gaussian Noise -11

  8. A Phonological Exercise Tenses 1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part. TALK [tɔk] [tɔks] [tɔkt] [tɔkt] THANK [θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt] [hæk] [hæks] [hækt] HACK [hækt] [kɹæks] [kɹækt] CRACK Verbs [slæp] [slæpt] SLAP

  9. A Phonological Exercise Suffixes • /Ø/ /s/ /t/ /t/ 1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part. TALK [tɔk] [tɔks] [tɔkt] [tɔkt] /tɔk/ THANK [θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt] /θeɪŋk/ [hæk] [hæks] [hækt] HACK [hækt] /hæk/ [kɹæks] [kɹækt] CRACK Stems /kɹæk/ [slæp] [slæpt] SLAP /slæp/

  10. A Phonological Exercise Suffixes • /Ø/ /s/ /t/ /t/ 1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part. TALK [tɔk] [tɔks] [tɔkt] [tɔkt] /tɔk/ THANK [θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt] /θeɪŋk/ [hæk] [hæks] [hækt] HACK [hækt] /hæk/ [kɹæks] [kɹækt] CRACK Stems /kɹæk/ [slæp] [slæpt] SLAP /slæp/

  11. A Phonological Exercise Suffixes • /Ø/ /s/ /t/ /t/ 1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part. TALK [tɔk] [tɔks] [tɔkt] [tɔkt] /tɔk/ THANK [θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt] /θeɪŋk/ [hæk] [hæks] [hækt] HACK [hækt] /hæk/ [kɹæk] [kɹæks] [kɹækt] [kɹækt] CRACK Stems /kɹæk/ [slæp] [slæps] [slæpt] [slæpt] SLAP /slæp/ Prediction!

  12. A Model of Phonology tɔk s Concatenate tɔks “talks”

  13. A Phonological Exercise Suffixes • /Ø/ /s/ /t/ /t/ 1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part. TALK [tɔk] [tɔks] [tɔkt] [tɔkt] /tɔk/ THANK [θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt] /θeɪŋk/ [hæk] [hæks] [hækt] HACK [hækt] /hæk/ [kɹæks] [kɹækt] CRACK Stems /kɹæk/ [slæp] [slæpt] SLAP /slæp/ [koʊdz] [koʊdɪt] CODE /koʊd/ [bæt] [bætɪt] BAT /bæt/

  14. A Phonological Exercise Suffixes • /Ø/ /s/ /t/ /t/ 1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part. TALK [tɔk] [tɔks] [tɔkt] [tɔkt] /tɔk/ THANK [θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt] /θeɪŋk/ [hæk] [hæks] [hækt] HACK [hækt] /hæk/ [kɹæks] [kɹækt] CRACK Stems /kɹæk/ [slæp] [slæpt] SLAP /slæp/ [koʊdz] [koʊdɪt] CODE /koʊd/ [bæt] [bætɪt] BAT /bæt/ z instead of s ɪtinstead of t

  15. A Phonological Exercise Suffixes • /Ø/ /s/ /t/ /t/ 1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part. TALK [tɔk] [tɔks] [tɔkt] [tɔkt] /tɔk/ THANK [θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt] /θeɪŋk/ [hæk] [hæks] [hækt] HACK [hækt] /hæk/ [kɹæks] [kɹækt] CRACK Stems /kɹæk/ [slæp] [slæpt] SLAP /slæp/ [koʊdz] [koʊdɪt] CODE /koʊd/ [bæt] [bætɪt] BAT /bæt/ [it] [itən] [eɪt] EAT /it/ eɪtinstead of itɪt

  16. A Model of Phonology koʊd s Concatenate koʊd#s Apply Phonology koʊdz “codes” Modeling word forms using latent underlying morphs and phonology. Cotterell et. al. TACL 2015

  17. A Model of Phonology rizaign ation Concatenate rizaign#ation Apply Phonology • rεzɪgneɪʃn “resignation”

  18. Fragment of Our Graph for English the plural suffix • rizaign z 1) Morphemes eɪʃən dæmn Concatenation 2) Underlying words dæmn#eɪʃən dæmn#z rεzɪgn#eɪʃən rizajn#z Phonology d’æmz r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn 3) Surface words “resignation” • “resigns” • “damnation” • “damns”

  19. Outline • A motivating example:phonology • General framework: • graphical models over strings • Inference on graphical models over strings • Dual decomposition inference • The general idea • Substring features and active set • Experiments and results

  20. Graphical Models over Strings? • Joint distribution over many strings • Variables • Range over Σ*  infinite set of all strings • Relations among variables • Usually specified by (multi-tape) FSTs A probabilistic approach to language change (Bouchard-Côté et. al. NIPS 2008) Graphical models over multiple strings. (Dreyer and Eisner. EMNLP 2009) Large-scale cognate recovery (Hall and Klein. EMNLP 2011)

  21. Graphical Models over Strings? • Strings are the basic units in natural languages. • Use • Orthographic (spelling) • Phonological (pronunciation) • Latent (intermediate steps not observed directly) • Size • Morphemes (meaningful subword units) • Words • Multi-word phrases, including “named entities” • URLs

  22. What relationships could you model? • spelling  pronunciation • word  noisy word (e.g., with a typo) • word  related word in another language (loanwords, language evolution, cognates) • singular  plural (for example) • root  word • underlying form  surface form

  23. Chains of relations can be useful • Misspelling or pun = spelling  pronunciation  spelling • Cognate = word  historical parent  historical child

  24. Factor Graph for phonology 1) Morpheme URs rizajgn z eɪʃən dæmn 1) Morpheme rizajgn z eɪʃən dæmn Concatenation (e.g.) Concatenation (e.g.) 2) Word URs rεzɪgn#eɪʃən rizajn#z dæmn#eɪʃən dæmn#z 2) Word URs dæmn#eɪʃən dæmn#z rεzɪgn#eɪʃən rizajn#z Phonology (PFST) Phonology (PFST) r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz 3) Word SRs r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz 3) Word SRs log-probabilityLet’s maximize it!

  25. Contextual Stochastic Edit Process Stochastic contextual edit distance and probabilistic FSTs. (Cotterell et. al. ACL 2014)

  26. Inference on a Factor Graph ? ? ? ? 1) Morpheme URs ? ? ? 2) Word URs r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  27. Inference on a Factor Graph bar foo s da 1) Morpheme URs ? ? ? 2) Word URs r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  28. Inference on a Factor Graph bar foo s da 1) Morpheme URs bar#s bar#da bar#foo 2) Word URs r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  29. Inference on a Factor Graph 0.01 8e-3 0.05 0.02 bar foo s da 1) Morpheme URs bar#s bar#da bar#foo 2) Word URs r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  30. Inference on a Factor Graph 0.01 8e-3 0.05 0.02 bar foo s da 1) Morpheme URs bar#s bar#da bar#foo 2) Word URs 6e-1200 2e-1300 7e-1100 r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  31. Inference on a Factor Graph 0.01 8e-3 0.05 0.02 bar foo s da 1) Morpheme URs bar#s bar#da bar#foo 2) Word URs  6e-1200 2e-1300 7e-1100 r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  32. Inference on a Factor Graph ? far foo s da 1) Morpheme URs far#s far#da far#foo 2) Word URs r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  33. Inference on a Factor Graph ? size foo s da 1) Morpheme URs size#s size#da size#foo 2) Word URs r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  34. Inference on a Factor Graph ? … foo s da 1) Morpheme URs …#s …#da …#foo 2) Word URs r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  35. Inference on a Factor Graph • rizajn foo s da 1) Morpheme URs rizajn#s rizain#da rizajn#foo 2) Word URs r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  36. Inference on a Factor Graph • rizajn foo s da 1) Morpheme URs rizajn#s rizajn#da rizajn#foo 2) Word URs 2e-5 0.01 0.008 r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  37. Inference on a Factor Graph • rizajn • eɪʃn s d 1) Morpheme URs rizajn#s rizajn#d • rizajn#eɪʃn 2) Word URs 0.001 0.01 0.015 r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  38. Inference on a Factor Graph • rizajgn • eɪʃn s d 1) Morpheme URs • rizajgn#s • rizajgn#da • rizajgn#foo 2) Word URs  0.008 0.008 0.013 r,εzɪgn’eɪʃn riz’ajnz riz’ajnd 3) Word SRs

  39. Inference on a Factor Graph rizajgn eɪʃn s d rizajgn#s rizajgn#d rizajgn#eɪʃn  0.013 0.008 0.008 riz’ajnz riz’ajnd r,εzɪgn’eɪʃn

  40. Challenges in Inference • Global discrete optimization problem. • Variables range over a infinite set: cannot be solved by ILP or even brute force. Undecidable! • Our previous papers used approximatealgorithms: Loopy Belief Propagation, or Expectation Propagation. • Q: can we do exact inference? • A: If we can live with 1-best and not marginal inference, then we can use Dual Decomposition … which is exact. • (if it terminates! the problem is undecidable in general …)

  41. Outline • A motivating example:phonology • General framework: • graphical models over strings • Inference on graphical models over strings • Dual decomposition inference • The general idea • Substring features and active set • Experiments and results

  42. Graphical Model for Phonology 1) Morpheme URs rizajgn z eɪʃən dæmn Concatenation (e.g.) 2) Word URs dæmn#eɪʃən dæmn#z rεzɪgn#eɪʃən rizajn#z Phonology (PFST) r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn d’æmz 3) Word SRs Jointly decide the value of the inter-dependent latent variables, which range over a infinite set.

  43. General Idea of Dual Decomp rizajgn • rεzign z eɪʃən eɪʃən dæmn dæmn#eɪʃən dæmn#z rεzɪgn#eɪʃən rizajn#z d’æmz r,εzɪgn’eɪʃn riz’ajnz d,æmn’eɪʃn

  44. General Idea of Dual Decomp eɪʃən rεzɪgn z dæmn eɪʃən dæmn z rizajn dæmn#eɪʃən dæmn#z rεzɪgn#eɪʃən rizajn#z d,æmn’eɪʃn d’æmz r,εzɪgn’eɪʃn riz’ajnz Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4

  45. General Idea of Dual Decomp I think it’s rεzɪgn I think it’s rizajn eɪʃən rεzɪgn z dæmn eɪʃən dæmn z rizajn dæmn#eɪʃən dæmn#z rεzɪgn#eɪʃən rizajn#z d,æmn’eɪʃn d’æmz r,εzɪgn’eɪʃn riz’ajnz Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4

  46. Outline • A motivating example:phonology • General framework: • graphical models over strings • Inference on graphical models over strings • Dual decomposition inference • The general idea • Substring features and active set • Experiments and results

  47. eɪʃən rεzɪgn z dæmn eɪʃən dæmn z rizajn dæmn#eɪʃən dæmn#z rεzɪgn#eɪʃən rizajn#z d,æmn’eɪʃn d’æmz r,εzɪgn’eɪʃn riz’ajnz Subproblem 1 Subproblem 2 Subproblem 3 Subproblem 4

  48. Substring Features and Active Set • Less i, a, j; • more ε, ɪ, g • (to match others) • I think it’s rizajn rεzɪgn rizajn eɪʃən z dæmn eɪʃən dæmn z • Lessε, ɪ, g; more i, a, j • (to match others) • I think it’s rεzɪgn dæmn#eɪʃən dæmn#z rεzɪgn#eɪʃən rizajn#z d,æmn’eɪʃn d’æmz r,εzɪgn’eɪʃn riz’ajnz Subproblem 1 Subproblem 1 Subproblem 1 Subproblem 1

  49. Features: “Active set” method • How many features? • Infinitely many possible n-grams! • Trick: Gradually increase feature set as needed. • Like Paul & Eisner (2012), Cotterell & Eisner (2015) • Only add features on which strings disagree. • Only add abcdonce abcand bcdalready agree. • Exception: Add unigrams and bigrams for free.

  50. Fragment of Our Graph for Catalan ? ? ? ? ? Stem of “grey” ? ? ? ? gris grizos grize grizes Separate these 4 words into 4 subproblems as before …

More Related