6•863J Natural Language Processing Lecture 8: Not an Earley finish

6•863J Natural Language ProcessingLecture 8: Not an Earley finish Instructor: Robert C• Berwickberwick@csail•mit•edu

The Menu Bar • Administrivia: Agenda: Earley’s algorithm Time complexity Parsing strategies: Earley algorithm What do people do? 6•863J/9•611J SP05

Example Grammar S  NP VP NP  Name NP  Det N N  pjs NP  NP PP N  elephant VP  V NP V  shot VP  VP PP P  in PP  P NP Det  an Det  my Name  I I shot an elephant in my pjs Name V Det N P Det N 6•863J/9•611J SP05

Marxist analysis ---- Rules ----- Start -> S S -> NP VP NP -> Det N NP -> NP PP NP -> Name PP -> P NP VP -> VP PP VP -> V NP VP -> V ---- Lexicon ----- an Det my Det shot V shot N my Det elephant N pajamas N I Name in P I shot an elephant in my pajamas 6•863J/9•611J SP05

State set simulation of nondeterministic machine • State Set S0 = set of all states (‘edges’) we can be in after reading 0 words/pos • March along constructing each state set Si+1 from previous state set state set Si 6•863J/9•611J SP05

State-set construction Initialize: S0initial state set= initial state edge [Start S , 0, n]  e-closure of this set under predict, complete Loop: For word i=1,…,n Si+1 computed from Si (using scan, predict, complete) scan; then predict, complete Final: Is a final edge in Sn? [Start S , 0, n] Sn ? 6•863J/9•611J SP05

Start position (left edge) Progress position (right edge) Dotted rule The basic state representation: an item triple [NP Det • N, 0, 2] 6•863J/9•611J SP05

Det N the guy Scan Scan Another way to view it [NP  Det N• 0 2] [NP  •Det N 0 0] [NP  Det N • 0 2] [NP  Det • N 0 1] 6•863J/9•611J SP05

Earley Parser 6•863J/9•611J SP05

The chart represents ambiguity by multiple back links in matrix End position of edge Start pos of edge 6•863J/9•611J SP05

Example: Top-down init w/ chart [S •NP VP, 0,0] 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs We are constructing State set S0 - 6•863J/9•611J SP05

In picture form [S •NP VP, 0, 0] [NP • D N, 0, 0] (from td rule, or predict) [NP  • Name, 0, 0] [NP  • NP PP, 0, 0] + all POS expansions 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs State set S0 now done 6•863J/9•611J SP05

Construct S1 from S0: Scan to next word…follow the bouncing dot… [S •NP VP, 0, 0] [NP • Det N, 0, 0] [NP  • Name, 0, 0] [NP  • NP PP, 0, 0] I shot an elephant in my pjs [NP  Name •, 0, 1] 6•863J/9•611J SP05

The Fundamental Rule(“complete”) Applies… • As time goes by… • Actually, as NP goes by… • We can also extend the length of all the other edges that had an NP with a dot before them… • That is, 6•863J/9•611J SP05

In picture form [S  NP • VP, 0, 1] complete [NP  NP • PP, 0, 1] complete [VP  • V NP, 1, 1] predict [VP  • VP PP, 1, 1] predict predict [PP  • P NP, 1, 1] 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs [NP  Name •, 0, 1] State set S1 now done 6•863J/9•611J SP05

Scan Verb - extend edge [VP  V • NP, 1,2] VP • VP PP I shot an elephant in my pjs NP  N • What next? … Predict NP, add all edges corresponding to expansion of NP S  NP • VP 6•863J/9•611J SP05

Picture: Complete combines edges (The “fundamental rule”) S  NP VP • VP  V NP • VP  VP • PP I shot an elephant in my pjs NP  D N • NP  N • NP  • NP PP S  NP • VP 6•863J/9•611J SP05

State set construction – cols in chart … … 6•863J/9•611J SP05

How does each of the 3 ops change Si? • Scan: (jump over a token) • Before: [A atb, k, i-1] in State Set Si-1 & word i= t • Result: Add [A at b, k, i] to State Set Si • (Do this for all items [triples] in State Set Si-1) • Predict (Push): (encounter nonterminal) • Before: [A aBb, k, i-1] , B= a nonterminal, in Si then • After: Add all new triples of form [B   g, i, i] to State Set Si • Complete(Pop): (finish w/ nonterminal) • Before: If Si contains triple in form [B  g , k, i] then • After: go to state set Sk and for all rules of form [A aBb, k, i-1], add triples [A aB b, k, i] to state set Si 6•863J/9•611J SP05

The main deal input x = x1 …… xn • S0 = {[Start  •S, 0, 0]} • For 0  i  n do: Process each item s  Si in order by applying to it the single applicable operation among: (a) Predictor (adds new items to Si ) (b) Completer (adds new items to Si ) (c) Scanner (adds new items to Si+1 ) • If Si+1 = , reject the input • If i= n and S n = {[Start  S • , 0, n],…} then Accept then input; else reject 6•863J/9•611J SP05

Earley’s Algorithm: Predictor • Predictor(AB, [i,j]) Example For each rule: Add: A aBb i j B g Bg i A B a b B g Input Rule 6•863J/9•611J SP05

Predictor (wishor) • Predict (Push): • Before: [A aB b, k, i] , B=nonterminal, in Si then • After: Add all new edges of form [B   g, i+1, i+1] to State Set Si+1 • Cries out its need for a phrase of type B 6•863J/9•611J SP05

Earley’s Algorithm: Scanner • Scanner(AB, [i,j]) Example For each rule: Add edge: A aBb A aBb Bw i j+1 i j A A a b B a b B w Input Rule 6•863J/9•611J SP05

Scan – formally (“Find a word/POS”) • Scan: (jump over a token) • Before: [A at b, k, i] in State Set Si & word i= t • Result: Add [A at b, k, i+1] to State Set Si+1 6•863J/9•611J SP05

Earley’s Algorithm: Completer • Completer(B, [i,j]) Example For each edge Add: B g i j A aBb A a Bb k j k i B A A g a b B a b B g Rule Input 6•863J/9•611J SP05

More precisely • Complete(Pop): (finish w/ phrase) • Before: If Si contains e in form [B  g , k, i] then go to back to state set Sk and for all rules of form [A aB b, j, k], add edges E’ [A aB b, j, i] to state set Si 6•863J/9•611J SP05

“The fundamental rule”: glues smaller trees into larger ones VP V • NP NP d n •  = shot start pos= 1, len 1 1 2 2 4 an elephant start= 2, len=2 VP V NP• 1 start pos= 1, len 3 4 6•863J/9•611J SP05

Earley’s Algorithm: Rules Initialization Predictor Scanner Completer 6•863J/9•611J SP05

State set construction – cols in chart … … 6•863J/9•611J SP05

After NP: S [i, k+2] Predict Start: k Scan: NP [k, k+1] Scan: NP [k, k+2] Complete: NP [k, k+2] Take note of the start-stop indices S Initially: S [i, k] Predict: NP [k, k] NP j k i the dog k+1 k k+2 6•863J/9•611J SP05

Indices: the left-hand edge ‘you are here’ • Predict: does not increment - NP[k,k] • Scan: does increment, by 1, the left-hand edge: NP[k,k]  NP[k,k+1]  NP[k,k+2] • Complete: increments left-hand edge of item in a previous State Set: S[j,k]  S[j, k+2] 6•863J/9•611J SP05

Example Grammar S  NP VP NP  Name NP  Det N N  pjs NP  NP PP N  elephant VP  V NP V  shot VP  VP PP P  in PP  P NP Det  an Det  my Name  I I shot an elephant in my pjs Name V Det N P Det N 6•863J/9•611J SP05

Remember this stands for (0, Start  • S) Initialize

Remember this stands for (0, S • NP VP) predict the kind of S we are looking for

predict the kind of NP we are looking for (actually we’ll look for 3 kinds: any of the 3 will do)

predict the kind of Det we are looking for (2 kinds)

predict the kind of NP we’re looking for but we were already looking for these so don’t add duplicates! Note that this happened when we were processing a left-recursive rule•

scan: the desired word is in the input!

scan: failure

attach the newly createdNP (which starts at 0) to its customers (incomplete constituents that end at 0 and have NP after the dot)

predict

scan: success!

scan: failure

complete

predict

6•863J Natural Language Processing Lecture 8: Not an Earley finish

6•863J Natural Language Processing Lecture 8: Not an Earley finish

Presentation Transcript

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

LIN3022 Natural Language Processing Lecture 7

Natural language processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

Natural Language Processing

LIN3022 Natural Language Processing Lecture 8

LIN3022 Natural Language Processing Lecture 9

Natural Language Processing

LIN3022 Natural Language Processing Lecture 11

LIN3022 Natural Language Processing Lecture 10

Natural Language Processing

Natural Language Processing

Natural Language Processing