610 likes | 699 Views
Parsing with PCFG. Ling 571 Fei Xia Week 3: 10/11-10/13/05. Outline. Misc CYK algorithm Converting CFG into CNF PCFG Lexicalized PCFG. Misc. Quiz 1: 15 pts, due 10/13 Hw2: 10 pts, due 10/13, ling580i_au05@u, ling580e_au05@u Treehouse weekly meeting:
E N D
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05
Outline • Misc • CYK algorithm • Converting CFG into CNF • PCFG • Lexicalized PCFG
Misc • Quiz 1: 15 pts, due 10/13 • Hw2: 10 pts, due 10/13, ling580i_au05@u, ling580e_au05@u • Treehouse weekly meeting: • Time: every Wed 2:30-3:30pm, tomorrow is the 1st meeting • Location: EE1 025 (Campus map 12-N, South of MGH) • Mailing list: cl-announce@u • Others: • Pongo policies • Machines: LLC, Parrington, Treehouse • Linux commands: ssh, sftp, … • Catalyst tools: ESubmit, EPost, …
Parsing algorithms • Top-down • Bottom-up • Top-down with bottom-up filtering • Earley algorithm • CYK algorithm • ....
CYK algorithm • Cocke-Younger-Kasami algorithm (a.k.a. CKY algorithm) • Require CFG to be in Chomsky Normal Form (CNF). • Bottom-up chart parsing algorithm using DP. • Fill in a two-dimension array: C[i][j] contains all the possible syntactic interpretations of the substring • Complexity:
Chomsky normal form (CNF) • Definition of CNF: • A B C • A a • S A, B, C are non-terminals; a is a terminal. S is the start symbol; B and C are not. • For every CFG, there is a CFG in CNF that is weakly equivalent.
CYK algorithm • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then
CYK algorithm (another way) • For every rule Aw_i, add it to Cell[i][i] • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If Cell[begin][m] contains B... and Cell[m+1][end] contains C… and ABC is a rule in the grammar then add ABC to Cell[begin][end] and remember m
An example Rules: VP V NP V book VP VP PP Nbook/flight/cards NP Det N Det that/the NP NP PP P with PP P NP
Parse “book that flight”: C1[begin][end] end=3 end=2 end=1 begin=1 begin=2 begin=3
Parse “book that flight”: C2[begin][span] span=3 span=2 span=1 begin=1 begin=2 begin=3
Data structures for the chart (1) (2) (3) (4)
Summary of CYK algorithm • Bottom-up using DP • Require the CFG to be in CNF • A very efficient algorithm • Easy to be extended
Chomsky normal form (CNF) • Definition of CNF: • A B C, • A a, • S Where A, B, C are non-terminals, a is a terminal, S is the start symbol, and B, C are not start symbols. • For every CFG, there is a CFG in CNF that is weakly equivalent.
Converting CFG to CNF • Add a new symbol S0, and a rule S0S (so the start symbol will not appear on the rhs of any rule) (2) Eliminate for each rule add for each rule , add unless has been previously eliminated.
Conversion (cont) (3) Remove unit rule add if unless the latter rule was previously removed. (4) Replace a rule where k>2 with replace any terminal with a new symbol and add a new rule
Removing rules Remove B Remove A
Removing unit rules • Remove • Remove
Removing unit rules (cont) • Remove • Removing
Summary of CFG parsing • Simply top-down and bottom-up parsing generate useless trees. • Top-down with bottom-up filtering has three problems. • Solution: use DP: • Earley algorithm • CYK algorithm
PCFG • PCFG is an extension of CFG. • A PCFG is a 5-tuple=(N, T, P, S, Pr), where Pr is a function assigning probability to each rule in P: or • Given a non-terminal A,
A PCFG S NP VP 0.8 N Mary 0.01 S Aux NP VP 0.15 Nbook 0.02 S VP 0.05 VPV 0.35 Vbought 0.02 VPV NP 0.45 VPVP PP 0.20 Deta 0.04 NPN 0.8 NPDet N 0.2 ….
Using probabilities • To estimate prob of a sentence and its parse trees. • Useful in disambiguation. • The prob of a tree: n is a node in T, r(n) is the rule used to expand n in T.
Computing P(T) S NP VP 0.8 N Mary 0.01 S Aux NP VP 0.15 Nbook 0.02 S VP 0.05 VPV 0.35 Vbought 0.02 VPV NP 0.45 VPVP PP 0.20 Deta 0.04 NPN 0.8 NPDet N 0.2 The sentence is “Mary bought a book”.
The most likely tree • P(T, S) = P(T) * P(S|T) = P(T) T is a parse tree, S is a sentence • The best parse tree for a sentence S
Find the most likely tree Given a PCFG and a sentence, how to find the best parse tree for S? One algorithm: CYK
CYK algorithm for CFG • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then
CYK algorithm for CFG (another implementation) • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then
Variables for CFG and PCFG • CFG: whether there is a parse tree whose root is A and which covers • PCFG: the prob of the most likely parse tree whose root is A and which covers
CYK algorithm for PCFG • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then
A CFG Rules: VP V NP V book VP VP PP Nbook/flight/cards NP Det N Det that/the NP NP PP P with PP P NP
Parse “book that flight” end=3 end=2 end=1 begin=1 begin=2 begin=3
A PCFG Rules: VP V NP 0.4 V book 0.001 VP VP PP 0.2 Nbook 0.01 NP Det N 0.3 Det that 0.1 NP NP PP 0.2 P with 0.2 PP P NP 1.0 Nflight 0.02
Parse “book that flight” end=3 end=2 end=1 begin=1 begin=2 begin=3
N-best parse trees • Best parse tree: • N-best parse trees:
CYK algorithm for N-best • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].
PCFG for Language Modeling (LM) • N-gram LM: • Syntax-based LM:
Calculating Pr(S) • Parsing: the prob of the most likely parse tree • LM: the sum of all parse trees
CYK for finding the most likely parse tree • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then
CYK for calculating LM • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C:
Learning PCFG Probabilities Given a treebank (i.e., a set of trees), use MLE: Without treebanks inside-outside algorithm
Q&A • PCFG • CYK algorithm
Problems of PCFG • Lack of sensitivity to structural dependency: • Lack of sensitivity to lexical dependency: