1 / 61

Probabilistic CKY

Probabilistic CKY. Roger Levy [thanks to Jason Eisner]. Managing Ambiguity. John saw Mary Typhoid Mary Phillips screwdriver Mary note how rare rules interact I see a bird is this 4 nouns – parsed like “city park scavenger bird”?

rollin
Download Presentation

Probabilistic CKY

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic CKY Roger Levy [thanks to Jason Eisner]

  2. Managing Ambiguity • John saw Mary • Typhoid Mary • Phillips screwdriver Mary note how rare rules interact • I see a bird • is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech, plus systematic ambiguity in noun sequences • Time flies like an arrow • Fruit flies like a banana • Time reactions like this one • Time reactions like a chemist • or is it just an NP?

  3. Our bane: Ambiguity • John saw Mary • Typhoid Mary • Phillips screwdriver Mary note how rare rules interact • I see a bird • is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech, plus systematic ambiguity in noun sequences • Time | flies like an arrow NP VP • Fruit flies | like a banana NP VP • Time | reactions like this one V[stem] NP • Time reactions | like a chemist S PP • or is it just an NP?

  4. How to solve this combinatorial explosion of ambiguity? • First try parsing without any weird rules, throwing them in only if needed. • Better: every rule has a weight. A tree’s weight is total weight of all its rules. Pick the overall lightest parse of sentence. • Can we pick the weights automatically?We’ll get to this later …

  5. The plan for the rest of parsing • There’s probability (inference) and then there’s statistics (model estimation) • These two problems are logically separate, yet interrelated in practice • We’ll start with the problem of efficient inference (probabilistic bottom-up parsing) • Then we’ll move to the estimation of good (generative) models that permit efficient bottom-up parsing • Finally, we’ll mention state-of-the-art work that doesn’t permit exact inference (“reranking” approaches)

  6. The critical recursion • We’ll do bottom-up parsing (weighted CKY) • When combining constituents into a larger constituent: • The weight of the new constituent is the sum of the weights of the combined subconstituents… • …plus the weight of the rule used to combine the subconstituents

  7. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  8. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  9. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  10. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  11. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  12. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  13. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  14. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  15. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  16. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  17. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  18. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  19. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  20. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  21. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  22. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  23. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  24. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  25. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  26. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  27. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  28. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  29. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  30. S Follow backpointers … 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  31. S NP VP 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  32. S NP VP PP VP 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  33. S NP VP PP VP P NP 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  34. S NP VP PP VP P NP Det N 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  35. Which entries do we need? 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  36. Which entries do we need? 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  37. Not worth keeping … 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  38. … since it just breeds worse options 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  39. Keep only best-in-class! “inferior stock” 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  40. Keep only best-in-class! (and backpointers so you can recover parse) 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  41. Probabilistic Trees • Instead of lightest weight tree, take highest probability tree • Given any tree, your assignment 1 generator would have some probability of producing it! • Just like using n-grams to choose among strings … • What is the probability of this tree? S NP time VP PP VP flies P like NP Det an N arrow

  42. Probabilistic Trees • Instead of lightest weight tree, take highest probability tree • Given any tree, your assignment 1 generator would have some probability of producing it! • Just like using n-grams to choose among strings … • What is the probability of this tree? • You rolled a lot of independent dice … S NP time VP | S) p( PP VP flies P like NP Det an N arrow

  43. Chain rule: One word at a time p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)

  44. Chain rule + backoff (to get trigram model) p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)

  45. Chain rule – written differently p(time flies like an arrow) = p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an) Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x)

  46. Chain rule + backoff p(time flies like an arrow) = p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an) Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x)

  47. Chain rule: One node at a time S S S S NP time VP | S) * p( | ) | S) = p( p( NP NP time VP VP NP VP PP VP flies P like NP S S * p( | ) Det an N arrow NP time NP time VP VP PP VP S S * p( | ) * … NP time NP time VP VP PP VP flies PP VP

  48. Chain rule + backoff S S S S NP time VP | S) * p( | ) | S) = p( p( NP NP time VP VP NP VP PP VP flies P like NP S S * p( | ) Det an N arrow NP time NP time VP VP PP VP S S * p( | ) * … NP time NP time VP VP PP VP flies PP VP

  49. Simplified notation S NP time VP | S) = p(S NP VP| S) * p(NP  flies| NP) p( PP VP flies P like NP * p(VP VP NP| VP) Det an N arrow * p(VP flies| VP) * …

  50. Already have a CKY alg for weights … S NP time VP | S) = w(S NP VP) + w(NP  flies| NP) w( PP VP flies P like NP + w(VP VP NP) Det an N arrow + w(VP flies) + … Just let w(X Y Z) = -log p(X Y Z| X) Then lightest tree has highest prob

More Related