1 / 51

Statistical NLP Winter 2009

Statistical NLP Winter 2009. Lecture 10: Parsing I. Roger Levy Thanks to Jason Eisner & Dan Klein for slides. Why is natural language parsing hard?. As language structure gets more abstract, computing it gets harder Document classification finite number of classes

hestia
Download Presentation

Statistical NLP Winter 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical NLPWinter 2009 Lecture 10: Parsing I Roger Levy Thanks to Jason Eisner & Dan Klein for slides

  2. Why is natural language parsing hard? • As language structure gets more abstract, computing it gets harder • Document classification • finite number of classes • fast computation at test time • Part-of-speech tagging (recovering label sequences) • Exponentially many possible tag sequences • But exact computation possible in O(n) • Parsing (recovering labeled trees) • Exponentially many, or even infinite, possible trees • Exact inference worse than tagging, but still within reach

  3. Why is parsing harder than tagging • How many trees are there for a given string? • Imagine a rule VPVP • …∞! • This is not a problem for inferring availability of structures (why?) • Nor is this a problem for inferring the most probable structure in a PCFG (why?)

  4. Why parsing is harder than tagging II • Ingredient 1: syntactic category ambiguity • Exponentially many category sequences, like tagging • Ingredient 2: attachment ambiguity • Classic case: prepositional-phrase (PP) attachment • 1 PP: no ambiguity • 2 PPs: some ambiguity

  5. Why parsing is harder than tagging III • 3 PPs: much more attachment ambiguity! • 5 PPs: 14 trees, 6 PPs: 42 trees, 7 PPs: 132 trees…

  6. Why parsing is harder than tagging IV • Tree-structure ambiguity grows like the Catalan numbers (Knuth, 1975; Church & Patil, 1982) • This is factorial growth on top of the exponential growth associated with sequence label ambiguity

  7. Why parsing is still tractable • This all makes parsing look really bad • But there’s still hope • Those factorially many parses are different combinations of common subparts

  8. How to parse tractably • Recall that we did HMM part-of-speech tagging by storing partial results in a trellis • An HMM is a special type of grammar with essentially two types of rules: • “Category Y can follow category X (with cost π)” • “Category X can be realized as word w (with cost η)” • The trellis is a graph whose structure reflects its rules • Edges between all sequentially adjacent category pairs

  9. How to parse tractably II • But a (weighted) CFG has more complicated rules: • “Category X can rewrite as categories α (with cost π)” • “Preterminal X can be realized as word w (with cost η)” • (2 is really a special case of 1) • A graph is not rich enough to reflect CFG/tree structure • Phrases need to be stored as partial results • We also need rule combination structure • We’ll do this with hypergraphs

  10. How to parse tractably III • Hypergraphs are like graphs, but have hyper-edges instead of edges • “We observe a DT as word 1 and an NN as word 2.” • “Together, these let us infer an NP spanning words 1—2.” start state allows us to infer each of these both of these are needed to infer this

  11. How to parse tractably IV Goal • Hypergraph for Bird shot flies • (only partial) Spanning words 1—3 Spanning words 1—2 Spanning words 2—3 Grammar: S NP VP VP V NP VP V NP N NP N N

  12. How to parse tractably V • The nodes in the hypergraph can be thought of as being arranged in a triangle • For a sentence of length N, this is the upper right triangle of an N×N matrix • This matrix is called the parse chart

  13. How to parse tractably VI • Before we study examples of parsing, let’s linger on the hypergraph for a moment • The goal of parsing is to fully interconnect all the evidence (words) and the goal • This could be done from the bottom up… • …or from the top down & left to right • These correspond to different parse strategies • Today: bottom-up (later: top-down)

  14. Bottom-up (CKY) parsing • Bottom-up is the most straightforward efficient parsing algorithm to implement • Known as Cocke-Kasami-Young (CKY) algorithm • We’ll illustrate it for the weighted CFG instance • Each rule has a weight (log-prob) associated with it • We’re looking for the “lightest” (lowest-weight or, equivalently, highest-probability) tree T for sentence S • Implicitly this is Bayes’ rule!

  15. CKY parsing II • Here’s the (partial) grammar we’ll use: • The sentence we’ll parse (see the ambiguity?): 1 S NP VP 6 S Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP 3 NP  time 4 NP  flies 4 VP  flies 3Vst time 2 P  like 5V like 1 Det an 8 N  arrow Imperative verb: “Dothe dishes!” Time flies like an arrow

  16. 1 S NP VP 6 S Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  17. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  18. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  19. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  20. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  21. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  22. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  23. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  24. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  25. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  26. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  27. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  28. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  29. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  30. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  31. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  32. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  33. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  34. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  35. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  36. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  37. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  38. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  39. S Follow backpointers … 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  40. S NP VP 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  41. S NP VP PP VP 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  42. S NP VP PP VP P NP 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  43. S NP VP PP VP P NP Det N 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  44. Which entries do we need? 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  45. Which entries do we need? 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  46. Not worth keeping … 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  47. … since it just breeds worse options 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  48. Keep only best-in-class! “inferior stock” 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  49. Keep only best-in-class! (and backpointers so you can recover parse) 1 S NP VP 6 S Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  50. Computational complexity of parsing • This approach has good space complexity • O(GN2)where G is the # categories in the grammar • What is the time complexity of the algorithm? • It’s cubic in N…why? • What about time complexity in G? • First, a clarification is in order • CFG rules can have right-hand sides of arbitrary length X α • But CKY works only w/ right-hand sides of max length 2 • So we need to convert the CFG for use with CKY

More Related