Parsing with PCFG

Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05

Outline • Misc • CYK algorithm • Converting CFG into CNF • PCFG • Lexicalized PCFG

Misc • Quiz 1: 15 pts, due 10/13 • Hw2: 10 pts, due 10/13, ling580i_au05@u, ling580e_au05@u • Treehouse weekly meeting: • Time: every Wed 2:30-3:30pm, tomorrow is the 1st meeting • Location: EE1 025 (Campus map 12-N, South of MGH) • Mailing list: cl-announce@u • Others: • Pongo policies • Machines: LLC, Parrington, Treehouse • Linux commands: ssh, sftp, … • Catalyst tools: ESubmit, EPost, …

CYK algorithm

Parsing algorithms • Top-down • Bottom-up • Top-down with bottom-up filtering • Earley algorithm • CYK algorithm • ....

CYK algorithm • Cocke-Younger-Kasami algorithm (a.k.a. CKY algorithm) • Require CFG to be in Chomsky Normal Form (CNF). • Bottom-up chart parsing algorithm using DP. • Fill in a two-dimension array: C[i][j] contains all the possible syntactic interpretations of the substring • Complexity:

Chomsky normal form (CNF) • Definition of CNF: • A  B C • A  a • S  A, B, C are non-terminals; a is a terminal. S is the start symbol; B and C are not. • For every CFG, there is a CFG in CNF that is weakly equivalent.

CYK algorithm • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then

CYK algorithm (another way) • For every rule Aw_i, add it to Cell[i][i] • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If Cell[begin][m] contains B... and Cell[m+1][end] contains C… and ABC is a rule in the grammar then add ABC to Cell[begin][end] and remember m

An example Rules: VP  V NP V book VP  VP PP Nbook/flight/cards NP  Det N Det that/the NP  NP PP P with PP  P NP

Parse “book that flight”: C1[begin][end] end=3 end=2 end=1 begin=1 begin=2 begin=3

Parse “book that flight”: C2[begin][span] span=3 span=2 span=1 begin=1 begin=2 begin=3

Data structures for the chart (1) (2) (3) (4)

Summary of CYK algorithm • Bottom-up using DP • Require the CFG to be in CNF • A very efficient algorithm • Easy to be extended

Converting CFG into CNF

Chomsky normal form (CNF) • Definition of CNF: • A  B C, • A  a, • S  Where A, B, C are non-terminals, a is a terminal, S is the start symbol, and B, C are not start symbols. • For every CFG, there is a CFG in CNF that is weakly equivalent.

Converting CFG to CNF • Add a new symbol S0, and a rule S0S (so the start symbol will not appear on the rhs of any rule) (2) Eliminate for each rule add for each rule , add unless has been previously eliminated.

Conversion (cont) (3) Remove unit rule add if unless the latter rule was previously removed. (4) Replace a rule where k>2 with replace any terminal with a new symbol and add a new rule

An example

Adding

Removing rules Remove B Remove A

Removing unit rules • Remove • Remove

Removing unit rules (cont) • Remove • Removing

Converting remaining rules

Summary of CFG parsing • Simply top-down and bottom-up parsing generate useless trees. • Top-down with bottom-up filtering has three problems. • Solution: use DP: • Earley algorithm • CYK algorithm

Probabilistic CFG (PCFG)

PCFG • PCFG is an extension of CFG. • A PCFG is a 5-tuple=(N, T, P, S, Pr), where Pr is a function assigning probability to each rule in P: or • Given a non-terminal A,

A PCFG S  NP VP 0.8 N Mary 0.01 S  Aux NP VP 0.15 Nbook 0.02 S  VP 0.05 VPV 0.35 Vbought 0.02 VPV NP 0.45 VPVP PP 0.20 Deta 0.04 NPN 0.8 NPDet N 0.2 ….

Using probabilities • To estimate prob of a sentence and its parse trees. • Useful in disambiguation. • The prob of a tree: n is a node in T, r(n) is the rule used to expand n in T.

Computing P(T) S  NP VP 0.8 N Mary 0.01 S  Aux NP VP 0.15 Nbook 0.02 S  VP 0.05 VPV 0.35 Vbought 0.02 VPV NP 0.45 VPVP PP 0.20 Deta 0.04 NPN 0.8 NPDet N 0.2 The sentence is “Mary bought a book”.

The most likely tree • P(T, S) = P(T) * P(S|T) = P(T) T is a parse tree, S is a sentence • The best parse tree for a sentence S

Find the most likely tree Given a PCFG and a sentence, how to find the best parse tree for S? One algorithm: CYK

CYK algorithm for CFG • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then

CYK algorithm for CFG (another implementation) • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

Variables for CFG and PCFG • CFG: whether there is a parse tree whose root is A and which covers • PCFG: the prob of the most likely parse tree whose root is A and which covers

CYK algorithm for PCFG • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

A CFG Rules: VP  V NP V book VP  VP PP Nbook/flight/cards NP  Det N Det that/the NP  NP PP P with PP  P NP

Parse “book that flight” end=3 end=2 end=1 begin=1 begin=2 begin=3

A PCFG Rules: VP  V NP 0.4 V book 0.001 VP  VP PP 0.2 Nbook 0.01 NP  Det N 0.3 Det that 0.1 NP  NP PP 0.2 P with 0.2 PP  P NP 1.0 Nflight 0.02

Parse “book that flight” end=3 end=2 end=1 begin=1 begin=2 begin=3

N-best parse trees • Best parse tree: • N-best parse trees:

CYK algorithm for N-best • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].

PCFG for Language Modeling (LM) • N-gram LM: • Syntax-based LM:

Calculating Pr(S) • Parsing: the prob of the most likely parse tree • LM: the sum of all parse trees

CYK for finding the most likely parse tree • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

CYK for calculating LM • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C:

CYK algorithm

Learning PCFG Probabilities Given a treebank (i.e., a set of trees), use MLE: Without treebanks  inside-outside algorithm

Q&A • PCFG • CYK algorithm

Problems of PCFG • Lack of sensitivity to structural dependency: • Lack of sensitivity to lexical dependency:

Parsing with PCFG