Probabilistic Parsing

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05

Outline • Lexicalized CFG (Recap) • Hw5 and Project 2 • Parsing evaluation measures: ParseVal • Collin’s parser • TAG • Parsing summary

Lexicalized CFG recap

Important equations

Lexicalized CFG • Lexicalized rules: • Sparse data problem • First generate the head • Then generate the unlexicalized rule

Lexicalized models

An example • he likes her

Head-head probability

Head-rule probability

Estimate parameters

Building a statistical tool • Design a model: • Objective function: generative model vs. discriminative model • Decomposition: independence assumption • The types of parameters and parameter size • Training: estimate model parameters • Supervised vs. unsupervised • Smoothing methods • Decoding:

Team Project 1 (Hw5) • Form a team: program language, schedule, expertise, etc. • Understand the lexicalized model • Design the training algorithm • Work out the decoding (parsing) algorithm: augment CYK algorithm. • Illustrate the algorithms with a real example.

Team Project 2 • Task: parse real data with a real grammar extracted from a treebank. • Parser: PCFG or lexicalized PCFG • Training data: English Penn Treebank Section 02-21 • Development data: section 00

Team Project 2 (cont) • Hw6: extract PCFG from the treebank • Hw7: make sure your parser works given real grammar and real sentences; measure parsing performance • Hw8: improve parsing results • Hw10: write a report and give a presentation

Parsing evaluation measures

Evaluation of parsers: ParseVal • Labeled recall: • Labeled precision: • Labeled F-measure: • Complete match: % of sents where recall and precision are 100% • Average crossing: # of crossing per sent • No crossing: % of sents which have no crossing.

An example Gold standard: (VP (V saw) (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope)))) Parser output: (VP (V saw) (NP (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope)))))

ParseVal measures • Gold standard: (VP, 1, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5, 6) • System output: (VP, 1, 6), (NP, 2, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5, 6) • Recall=4/4, Prec=4/5, crossing=0

A different annotation Gold standard: (VP (V saw) (NP (Det the) (N’ (N man)) (PP (P with) (NP (Det a) (N’ (N telescope))))) Parser output: (VP (V saw) (NP (Det the) (N’ (N man) (PP (P with) (NP (Det a) (N’ (N telescope)))))))

ParseVal measures (cont) • Gold standard: (VP, 1, 6), (NP, 2, 3), (N’, 3, 3), (PP, 4, 6), (NP, 5, 6), (N’, 6,6) • System output: (VP, 1, 6), (NP, 2, 6), (N’, 3, 6), (PP, 4, 6), (NP, 5, 6), (N’, 6, 6) • Recall=4/6, Prec=4/6, crossing=1

EVALB • A tool that calculates ParseVal measures • To run it: evalb –p parameter_file gold_file system_output • A copy is available in my dropbox • You will need it for Team Project 2

Summary of Parsing evaluation measures • ParseVal is the widely used: F-measure is the most important • The results depend on annotation style • EVALB is a tool that calculates ParseVal measures • Other measures are used too: e.g., accuracy of dependency links

History-based models

History-based models • History-based approaches maps (T, S) into a decision sequence • Probability of tree T for sentence S is:

History-based models (cont) • PCFGs can be viewed as a history-based model • There are other history-based models • Magerman’s parser (1995) • Collin’s parsers (1996, 1997, ….) • Charniak’s parsers (1996,1997,….) • Ratnaparkhi’s parser (1997)

Collins’ models • Model 1: Generative model of (Collins, 1996) • Model 2: Add complement/adjunct distinction • Model 3: Add wh-movement

Model 1 • First generate the head constituent label • Then generate left and right dependents

Model 1(cont)

An example Sentence: Last week Marks bought Brooks.

Model 2 • Generate a head label H • Choose left and right subcat frames • Generate left and right arguments • Generate left and right modifiers

An example

Model 3 • Add Trace and wh-movement • Given that the LHS of a rule has a gap, there are three ways to pass down the gap • Head: S(+gap)NP VP(+gap) • Left: S(+gap)NP(+gap) VP • Right: SBAR(that)(+gap)WHNP(that) S(+gap)

Parsing results

Tree Adjoining Grammar (TAG)

TAG • TAG basics: • Extension of LTAG • Lexicalized TAG (LTAG) • Synchronous TAG (STAG) • Multi-component TAG (MCTAG) • ….

TAG basics • A tree-rewriting formalism (Joshi et. al, 1975) • It can generate mildly context-sensitive languages. • The primitive elements of a TAG are elementary trees. • Elementary trees are combined by two operations: substitution and adjoining. • TAG has been used in • parsing, semantics, discourse, etc. • Machine translation, summarization, generation, etc.

S VP VP NP ADVP VP* V NP ADV draft still Two types of elementary trees Initial tree: Auxiliary tree:

Substitution operation

They draft policies

Y Y* Y* Adjoining operation

They still draft policies

Derivation tree Derived tree Elementary trees Derivation tree

Derived tree vs. derivation tree • The mapping is not 1-to-1. • Finding the best derivation is not the same as finding the best derived tree.

S S S NP i NP VP S NP i V NP S V NP N draft NP VP i PN do what V NP S NP PN they V S* draft i N they do what Wh-movement What do they draft ?

Long-distance wh-movement S S NP S i S NP NP i VP V S V NP NP VP does draft i John S S NP VP S think V NP NP VP they V S* V S* draft i does think What does John think they draft ? what

S S NP S i NP VP PN VP NP V NP VP who PP have V NP P NP S VP have with i i NP S* VP* PP PN P NP who i with Who did you have dinner with?

TAG extension • Lexicalized TAG (LTAG) • Synchronized TAG (STAG) • Multi-component TAG (MCTAG) • ….

STAG • The primitive elements in STAG are elementary tree pairs. • Used for MT

Summary of TAG • A formalism beyond CFG • Primitive elements are trees, not rules • Extended domain of locality • Two operations: substitution and adjoining • Parsing algorithm: • Statistical parser for TAG • Algorithms for extracting TAG from treebanks.

Probabilistic Parsing