1 / 23

CS 479, section 1: Natural Language Processing

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lecture #29: PCFG Transformations. Thanks to Dan Klein of UC Berkeley for many of the materials used in this lecture. Announcements. Homework 0.3

asha
Download Presentation

CS 479, section 1: Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lecture #29: PCFG Transformations Thanks to Dan Klein of UC Berkeley for many of the materials used in this lecture.

  2. Announcements • Homework 0.3 • Due Monday • Final Project Choice • Send in email Monday (was today) • Final Project • Three options: • Propose (possibly as a team) • No proposal – just decide • Project #4 • Project #5 • Project #4 Help Session • Tuesday at 4pm in CS Conf. Room

  3. Parsing Outline • Introduction to parsing natural language;Probabilistic Context-Free Grammars (PCFGs) • Independence Assumptions • Parsing Algorithm: Probabilistic CKY • PCFG Transformations • Markov grammars and other generative models • (Extra) Agenda-Based Parsing • Dependency Parsing

  4. How to use Unary Productions

  5. Quick quiz • Is PCKY greedy? • What is the space requirement (worst case) for PCKY? Hint:

  6. Objectives • Explore how to improve a Treebank PCFG by changing the independence assumptions while remaining in the PCFG formalism • Introduce binarization and Markovization; evaluate! • Further guidance for Project #4

  7. Independence Assumptions • How can we overcome the inappropriately strong independence assumptions in a Treebank PCFG? • Constraint: stay in the PCFG framework! • We can relax independence assumptions by encoding dependencies into the non-terminal symbols.

  8. Encode Dependenciesin Non-terminals Parent annotation [Johnson 98] • Re-annotate the Treebank using information already in the Treebank! • What are the most useful features to encode? • Annotations split the non-terminals into sub-categories. • Conditioning on history: P(NP^S  PRP) • P(NP^S  PRP)= P(NP  PRP | S)= P(PRP | Grandparent=S, Parent=NP) • Should make you think “Markov order 2” • Packing two labels into one state! – the same optimization we made in the tagger

  9. Another Example:Marking Possessive NPs • Percolating important info. upwards: P(NP-POS NNP POS) • Isn’t history conditioning • P(NP-POS NNP POS) =P(NNP POS | NP-POS) • Making information from inside available to influence outside by pretending it came from outside. • Feature grammars vs. annotation • Can think of a symbol like NP^NP-POS as NP [parent:NP, +POS] • Many grammar formalisms do this: GPSG, HPSG, LFG, …

  10. Evaluation: Vertical Markovization • Vertical Markov order: rewrites depend on past k ancestor nodes. (generalizes parent annotation) Order 2 Order 1

  11. Binarization of N-ary Rules • Often we want to write non-binary grammar rules such as: • We also observe these sorts of constituents in the Treebank. • But PCKY requires Chomsky Normal Form • We can still work with these rules by introducing new intermediate symbols into our grammar: VP  VBD NP PP PP VP @VPVBD  VBD @VPVBD NP  NP binarizing top-down (as it were)from the left @VPVBD NP PP  PP PP

  12. Binarization These may look like rules, But they’re just names!

  13. Horizontal Markovization HorizontalMarkovization Order 1 Binarized

  14. Horizontal Markovization HorizontalMarkovization Order 1 Binarized Merged “Merged states”: in other words, collapses distinctions

  15. Generalization Which of these grammars can parse the following? Binarized Order 1

  16. Evaluation:Horizontal Markovization Order  Order 1

  17. Evaluation: Vertical and Horizontal • Examples: • Raw treebank: v=1, h= • Johnson 98: v=2, h= • Collins 99: v=2, h=2 • K&M Starting Pt.: v=2v, h=2v • Best F1: v=3, h=2v

  18. Note: Evaluation • After parsing with an annotated grammar, the annotations are then stripped for evaluation against the Treebank.

  19. Project #4 • S-expression Reader • Tree Train / Learn Vertical markovization Legend: Binarization You implement Horizontal markovization You invoke provided helper classes • Binary, Annotated Tree Count local trees • Rule countsin Grammar, Lexicon • PCFG in CNF

  20. Project #4 • Sentence parse (PCKY) Parse / Test • Binary Parse Tree (Possibly incomplete) fit Legend: You implement • Binary, AnnotatedParse Tree You invoke provided helper classes De-binarize De-annotate • Parse Tree • S-expression • Metric (F1)

  21. Plan B: Parse Tree Fitting What if you run PCKY and don’t find a valid parse (nothing is in the top-left corner)?

  22. Plan B: Parse Tree Fitting What if you run PCKY and don’t find a valid parse (nothing is in the top-left corner)? Make something up to ensure you have a valid (binary) tree. That way you’ll get partial credit for the pieces you do have right. ?A and ?B represent non-terminals that you’ll need to specify.

  23. Next • Leaving the PCFG framework: • Markov Grammars

More Related