1.74k likes | 1.89k Views
6•863J Natural Language Processing Lecture 8: Not an Earley finish. Instructor: Robert C• Berwick berwick@csail•mit•edu. The Menu Bar. Administrivia: Agenda: Earley’s algorithm Time complexity Parsing strategies: Earley algorithm What do people do?. Example Grammar.
E N D
6•863J Natural Language ProcessingLecture 8: Not an Earley finish Instructor: Robert C• Berwickberwick@csail•mit•edu
The Menu Bar • Administrivia: Agenda: Earley’s algorithm Time complexity Parsing strategies: Earley algorithm What do people do? 6•863J/9•611J SP05
Example Grammar S NP VP NP Name NP Det N N pjs NP NP PP N elephant VP V NP V shot VP VP PP P in PP P NP Det an Det my Name I I shot an elephant in my pjs Name V Det N P Det N 6•863J/9•611J SP05
Marxist analysis ---- Rules ----- Start -> S S -> NP VP NP -> Det N NP -> NP PP NP -> Name PP -> P NP VP -> VP PP VP -> V NP VP -> V ---- Lexicon ----- an Det my Det shot V shot N my Det elephant N pajamas N I Name in P I shot an elephant in my pajamas 6•863J/9•611J SP05
State set simulation of nondeterministic machine • State Set S0 = set of all states (‘edges’) we can be in after reading 0 words/pos • March along constructing each state set Si+1 from previous state set state set Si 6•863J/9•611J SP05
State-set construction Initialize: S0initial state set= initial state edge [Start S , 0, n] e-closure of this set under predict, complete Loop: For word i=1,…,n Si+1 computed from Si (using scan, predict, complete) scan; then predict, complete Final: Is a final edge in Sn? [Start S , 0, n] Sn ? 6•863J/9•611J SP05
Start position (left edge) Progress position (right edge) Dotted rule The basic state representation: an item triple [NP Det • N, 0, 2] 6•863J/9•611J SP05
Det N the guy Scan Scan Another way to view it [NP Det N• 0 2] [NP •Det N 0 0] [NP Det N • 0 2] [NP Det • N 0 1] 6•863J/9•611J SP05
Earley Parser 6•863J/9•611J SP05
The chart represents ambiguity by multiple back links in matrix End position of edge Start pos of edge 6•863J/9•611J SP05
Example: Top-down init w/ chart [S •NP VP, 0,0] 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs We are constructing State set S0 - 6•863J/9•611J SP05
In picture form [S •NP VP, 0, 0] [NP • D N, 0, 0] (from td rule, or predict) [NP • Name, 0, 0] [NP • NP PP, 0, 0] + all POS expansions 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs State set S0 now done 6•863J/9•611J SP05
Construct S1 from S0: Scan to next word…follow the bouncing dot… [S •NP VP, 0, 0] [NP • Det N, 0, 0] [NP • Name, 0, 0] [NP • NP PP, 0, 0] I shot an elephant in my pjs [NP Name •, 0, 1] 6•863J/9•611J SP05
The Fundamental Rule(“complete”) Applies… • As time goes by… • Actually, as NP goes by… • We can also extend the length of all the other edges that had an NP with a dot before them… • That is, 6•863J/9•611J SP05
In picture form [S NP • VP, 0, 1] complete [NP NP • PP, 0, 1] complete [VP • V NP, 1, 1] predict [VP • VP PP, 1, 1] predict predict [PP • P NP, 1, 1] 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs [NP Name •, 0, 1] State set S1 now done 6•863J/9•611J SP05
Scan Verb - extend edge [VP V • NP, 1,2] VP • VP PP I shot an elephant in my pjs NP N • What next? … Predict NP, add all edges corresponding to expansion of NP S NP • VP 6•863J/9•611J SP05
Picture: Complete combines edges (The “fundamental rule”) S NP VP • VP V NP • VP VP • PP I shot an elephant in my pjs NP D N • NP N • NP • NP PP S NP • VP 6•863J/9•611J SP05
State set construction – cols in chart … … 6•863J/9•611J SP05
How does each of the 3 ops change Si? • Scan: (jump over a token) • Before: [A atb, k, i-1] in State Set Si-1 & word i= t • Result: Add [A at b, k, i] to State Set Si • (Do this for all items [triples] in State Set Si-1) • Predict (Push): (encounter nonterminal) • Before: [A aBb, k, i-1] , B= a nonterminal, in Si then • After: Add all new triples of form [B g, i, i] to State Set Si • Complete(Pop): (finish w/ nonterminal) • Before: If Si contains triple in form [B g , k, i] then • After: go to state set Sk and for all rules of form [A aBb, k, i-1], add triples [A aB b, k, i] to state set Si 6•863J/9•611J SP05
The main deal input x = x1 …… xn • S0 = {[Start •S, 0, 0]} • For 0 i n do: Process each item s Si in order by applying to it the single applicable operation among: (a) Predictor (adds new items to Si ) (b) Completer (adds new items to Si ) (c) Scanner (adds new items to Si+1 ) • If Si+1 = , reject the input • If i= n and S n = {[Start S • , 0, n],…} then Accept then input; else reject 6•863J/9•611J SP05
Earley’s Algorithm: Predictor • Predictor(AB, [i,j]) Example For each rule: Add: A aBb i j B g Bg i A B a b B g Input Rule 6•863J/9•611J SP05
Predictor (wishor) • Predict (Push): • Before: [A aB b, k, i] , B=nonterminal, in Si then • After: Add all new edges of form [B g, i+1, i+1] to State Set Si+1 • Cries out its need for a phrase of type B 6•863J/9•611J SP05
Earley’s Algorithm: Scanner • Scanner(AB, [i,j]) Example For each rule: Add edge: A aBb A aBb Bw i j+1 i j A A a b B a b B w Input Rule 6•863J/9•611J SP05
Scan – formally (“Find a word/POS”) • Scan: (jump over a token) • Before: [A at b, k, i] in State Set Si & word i= t • Result: Add [A at b, k, i+1] to State Set Si+1 6•863J/9•611J SP05
Earley’s Algorithm: Completer • Completer(B, [i,j]) Example For each edge Add: B g i j A aBb A a Bb k j k i B A A g a b B a b B g Rule Input 6•863J/9•611J SP05
More precisely • Complete(Pop): (finish w/ phrase) • Before: If Si contains e in form [B g , k, i] then go to back to state set Sk and for all rules of form [A aB b, j, k], add edges E’ [A aB b, j, i] to state set Si 6•863J/9•611J SP05
“The fundamental rule”: glues smaller trees into larger ones VP V • NP NP d n • = shot start pos= 1, len 1 1 2 2 4 an elephant start= 2, len=2 VP V NP• 1 start pos= 1, len 3 4 6•863J/9•611J SP05
Earley’s Algorithm: Rules Initialization Predictor Scanner Completer 6•863J/9•611J SP05
State set construction – cols in chart … … 6•863J/9•611J SP05
After NP: S [i, k+2] Predict Start: k Scan: NP [k, k+1] Scan: NP [k, k+2] Complete: NP [k, k+2] Take note of the start-stop indices S Initially: S [i, k] Predict: NP [k, k] NP j k i the dog k+1 k k+2 6•863J/9•611J SP05
Indices: the left-hand edge ‘you are here’ • Predict: does not increment - NP[k,k] • Scan: does increment, by 1, the left-hand edge: NP[k,k] NP[k,k+1] NP[k,k+2] • Complete: increments left-hand edge of item in a previous State Set: S[j,k] S[j, k+2] 6•863J/9•611J SP05
Example Grammar S NP VP NP Name NP Det N N pjs NP NP PP N elephant VP V NP V shot VP VP PP P in PP P NP Det an Det my Name I I shot an elephant in my pjs Name V Det N P Det N 6•863J/9•611J SP05
Remember this stands for (0, Start • S) Initialize
Remember this stands for (0, S • NP VP) predict the kind of S we are looking for
predict the kind of NP we are looking for (actually we’ll look for 3 kinds: any of the 3 will do)
predict the kind of NP we’re looking for but we were already looking for these so don’t add duplicates! Note that this happened when we were processing a left-recursive rule•
attach the newly createdNP (which starts at 0) to its customers (incomplete constituents that end at 0 and have NP after the dot)