140 likes | 223 Views
Context-free Parsing. Earley’s algorithm. Earley’s algorithm. Very efficient parallel top-down parsing algorithm O( N 3 ) where N is length of input Published in Comm ACM 1970 Never parses the same substring twice
E N D
Context-free Parsing Earley’s algorithm
Earley’s algorithm • Very efficient parallel top-down parsing algorithm O(N3) where N is length of input • Published in Comm ACM 1970 • Never parses the same substring twice • Data structure is an array(0..N) of entries, where each entry is a “dotted rule”: a copy of a grammar rule with a dot and a pointer
The algorithm 0. Copy grammar into table(0), inserting a dot after the first symbol, and adding 0 at the end; set n=1 • (Scan) Look up nth word in lexicon. For each entry k, find all rules in table(n–1) which have k just after the dot. Copy each such rule into table(n), moving the dot one place to the right. • (Complete) Look for any rule r in table(n) ending in a dot and a number x. For each rule, locate in table(x) all rules having r’s mother just after the dot. Copy into table(n), moving the dot one place to the right. • (Predict) For all rules in table(n) which have a non-terminal just after the dot, copy all such rules from table(0), changing the 0 to n. • Repeat steps 2 and 3 on any rules just added, without adding any identical rules. • Repeat step 4 until no new entries have been added. • Increment n := n+1; if n<N, repeat from step 1. Success if there is in table(N) a rule “S … dot 0”
the man shot an elephant in his pyjamas 1 2 3 4 5 6 7 8 S (NP VP) NP (det n) NP (det n PP) VP (v) VP (v NP) VP (v NP PP) PP (prep NP) the = det man = n shot = v n an = det elephant = n in = prep his = det Pyjamas = n Table(0) S (dot NP VP) 0 NP (dot det n) 0 NP (dot det n PP) 0 VP (dot v) 0 VP (dot v) NP 0 VP (dot v NP PP) 0 PP (dot prep NP) 0 Copy grammar into table(0), inserting a dot after the first symbol, and adding 0 at the end; set n=1 N = 8 n = 1
the man shot an elephant in his pyjamas 1 2 3 4 5 6 7 8 the = det Table(0) S (dot NP VP) 0 NP (dot det n) 0 NP (dot det n PP) 0 VP (dot v) 0 VP (dot v NP) 0 VP (dot v NP PP) 0 PP (dot prep NP) 0 (Scan) Look up nth word in lexicon. For each entry k, find all rules in table(n–1) which have k just after the dot. Copy each such rule into table(n), moving the dot one place to the right. (Complete) Look for any rule r in table(n) ending in a dot and a number x… – Does not apply (Predict) For all rules in table(n) which have a non-terminal just after the dot, …Does not apply Table(1) NP (det(the) dot n) 0 NP (det(the) dot n) PP 0 Repeat steps on any rules just added, without adding any identical rules. Does not apply Increment n := n+1; if n<N, repeat from step 1 N = 8 n = 1 2
VP (dot v) 2 VP (dot v NP) 2 VP (dot v NP PP) 2 PP (dot prep NP) 2 the man shot an elephant in his pyjamas 1 2 3 4 5 6 7 8 man = n (Scan) Look up nth word in lexicon. For each entry k, find all rules in table(n–1) which have k just after the dot. Copy each such rule into table(n), moving the dot one place to the right. Table(1) NP (det(the) dot n) 0 NP (det(the) dot n PP) 0 Table(2) NP (det(the) n(man)) dot 0 NP (det(the) n(man)) dot PP 0 (Complete) Look for any rule r in table(n) ending in a dot and a number x . For each rule, locate in table(x) all rules having r’s mother just after the dot. Copy into table(n), moving the dot one place to the right. S (NP det(the) n(man) dot VP) 0 (Predict) For all rules in table(n) which have a non-terminal just after the dot , copy all such rules from table(0), changing the 0 to n. Table(0) S (dot NP VP) 0 NP (dot det n) 0 NP (dot det n PP) 0 VP (dot v) 0 VP (dot v NP) 0 VP (dot v NP PP) 0 PP (dot prep NP) 0 N = 8 n = 2 Repeat steps on any rules just added, without adding any identical rules. Does not apply 3 Increment n := n+1; if n<N, repeat from step 1
the man shot an elephant in his pyjamas 1 2 3 4 5 6 7 8 shot = v n Table(2) NP (det(the) n(man)) dot 0 NP (det(the) n(man) dot PP) 0 S (NP (det(the) n(man)) dot VP) 0 VP (dot v) 2 VP (dot v NP) 2 VP (dot v NP PP) 2 PP (dot prep NP) 2 (Scan) Look up nth word in lexicon. For each entry k, find all rules in table(n–1) which have k just after the dot. Copy each such rule into table(n), moving the dot one place to the right. (Complete) Look for any rule r in table(n) ending in a dot and a number x . For each rule, locate in table(x) all rules having r’s mother just after the dot. Copy into table(n), moving the dot one place to the right. Table(3) VP (v(shot)) dot 2 VP (v(shot) dot NP) 2 VP (v(shot) dot NP PP) 2 (Predict) For all rules in table(n) which have a non-terminal just after the dot , copy all such rules from table(0), changing the 0 to n. S (NP (det(the) n(man)) VP (v(shot))) dot 0 NP (dot det n) 3 NP (dot det n PP) 3 N = 8 n = 3 Repeat steps on any rules just added, without adding any identical rules. Does not apply 4 Increment n := n+1; if n<N, repeat from step 1
the man shot an elephant in his pyjamas 1 2 3 4 5 6 7 8 an = det Table(3) VP (v(shot)) dot 2 VP (v(shot) dot NP) 2 VP (v(shot) dot NP PP) 2 S (NP (det(the) n(man)) VP (v(shot))) dot 0 NP (dot det n) 3 NP (dot det n PP) 3 (Scan) Look up nth word in lexicon. For each entry k, find all rules in table(n–1) which have k just after the dot. Copy each such rule into table(n), moving the dot one place to the right. (Complete) Look for any rule r in table(n) ending in a dot and a number x . Does not apply Table(4) NP (det(an) dot n) 3 NP (det(an) dot n PP) 3 (Predict) For all rules in table(n) which have a non-terminal just after the dot,… Does not apply Repeat steps on any rules just added, without adding any identical rules. Does not apply N = 8 n = 4 Increment n := n+1; if n<N, repeat from step 1 5
the man shot an elephant in his pyjamas 1 2 3 4 5 6 7 8 elephant = n Table(4) NP (det(an) dot n) 3 NP (det(an) dot n PP) 3 (Scan) Look up nth word in lexicon. For each entry k, find all rules in table(n–1) which have k just after the dot. Copy each such rule into table(n), moving the dot one place to the right. (Complete) Look for any rule r in table(n) ending in a dot and a number x . For each rule, locate in table(x) all rules having r’s mother just after the dot. Copy into table(n), moving the dot one place to the right. (Predict) For all rules in table(n) which have a non-terminal just after the dot , copy all such rules from table(0), changing the 0 to n. Repeat steps on any rules just added, without adding any identical rules. VP rule is complete: look at Table 2 for rule “… dot VP” Table(5) NP (det(an) n(elephant)) dot 3 NP (det(an) n(elephant) dot PP) 3 Table(3) VP (v(shot) dot) 2 VP (v(shot) dot NP) 2 VP (v(shot) dot NP PP) 2 S (NP (det(the) n(man)) dot VP) 0 NP (dot det n) 3 NP (dot det n PP) 3 Table(2) NP (det(the) n(man) dot) 0 NP (det(the) n(man) dot PP) 0 S (NP (det(the) n(man)) dot VP) 0 VP (dot v) 2 VP (dot v NP) 2 VP (dot v NP PP) 2 PP (dot prep NP) 2 Table(0) S (dot NP VP) 0 NP (dot det n) 0 NP (dot det n PP) 0 VP (dot v) 0 VP (dot v NP) 0 VP (dot v NP PP) 0 PP (dot prep NP) 0 VP (v(shot) NP (det(an) n(elephant))) dot 2 VP (v(shot) (det(an) n(elephant)) dot PP) 2 PP (dot prep NP) 5 S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(elephant))) dot 0 N = 8 n = 5 6 Increment n := n+1; if n<N, repeat from step 1
the man shot an elephant in his pyjamas 1 2 3 4 5 6 7 8 in = prep Table(5) NP (det(an) n(elephant)) dot 3 NP (det(an) n(elephant) dot PP) 3 VP (v(shot) NP (det(an) n(elephant) dot)) 2 VP (v(shot) (det(an) n(elephant) dot) PP) 2 PP (dot prep NP) 5 S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(elephant))) dot 0 Scan Predict Complete - none Table(6) PP (prep(in) dot NP) 5 NP (dot det n) 6 NP (dot det n PP) 6 Increment n := n+1; if n<N, repeat from step 1 N = 8 n = 6 7
the man shot an elephant in his pyjamas 1 2 3 4 5 6 7 8 his = det Scan Table(6) PP (prep(in) dot NP) 5 NP (dot det n) 6 NP (dot det n PP) 6 Predict - none Complete - none Table(7) NP (det(his) dot n) 6 NP (det(his) dot n PP) 6 Increment n := n+1; if n<N, repeat from step 1 N = 8 n = 7 8
Repeat Predict – none Complete Repeat Predict – none Complete Table(6) PP (prep(in) dot NP) 5 NP (dot det n) 6 NP (dot det n PP) 6 the man shot an elephant in his pyjamas 1 2 3 4 5 6 7 8 pyjamas = n Scan Table(7) NP (det(his) dot n) 6 NP (det(his) dot n PP) 6 Predict Complete Table(8) NP (det(his) n(pyjamas)) dot 6 NP (det(his) n(pyjamas) dot PP) 6 Look at table(6) Repeat Predict – none Complete PP (dot prep NP) 8 PP (prep(in) NP (det(his) n(pyjamas))) dot 5 Look at table(3) and at table(2) Look at table(5) NP (det(an) n(el-) PP (prep(in) NP (det(his) n(pyj-))) dot 3 VP (v(shot) NP (det(an) n(el-)) PP (prep(in) NP (det(his) n(pyj-)) dot 2 Table(5) NP (det(an) n(elephant) dot) 3 NP (det(an) n(elephant) dot PP) 3 VP (v(shot) NP (det(an) n(elephant) dot)) 2 VP (v(shot) (det(an) n(elephant)) dot PP) 2 PP (dot prep NP) 5 S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(elephant))) dot 0 Table(2) (extract) S (NP (det(the) n(man)) dot VP) 0 VP (v(shot) NP (det(an) n(el-) PP (prep(in) NP (det(his) n(pyj-)))) dot 2 VP (v(shot) NP (det(an) n(el-) PP (prep(in) NP (det(his) n(pyj-))) dot PP) 2 S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(el-)) PP (prep(in) NP (det(his) n(pyj-))) dot 0 Table(3) (extract) VP (v(shot) dot NP) 2 VP (v(shot) dot NP PP) 2 Table(2) (extract) S (NP (det(the) n(man)) dot VP) 0 N = 8 n = 7 S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(el-) PP (prep(in) NP (det(his) n(pyj-)))) dot 0 8 Increment n := n+1; n=N, so stop
Success if there is in table(N) a rule “S … dot 0” Table(8) NP (det(his) n(pyjamas)) dot 6 NP (det(his) n(pyjamas) dot PP) 6 PP (dot prep NP) 8 PP (prep(in) NP (det(his) n(pyjamas)))) dot 5 NP (det(an) n(el-) PP (prep(in) NP (det(his) n(pyj-)))) dot 3 VP (v(shot) NP (det(an) n(el-)) PP (prep(in) NP (det(his) n(pyj-))) dot 2 VP (v(shot) NP (det(an) n(el-) PP (prep(in) NP (det(his) n(pyj-)))) dot 2 VP (v(shot) NP (det(an) n(el-) PP (prep(in) NP (det(his) n(pyj-))) dot PP) 2 S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(el-)) PP (prep(in) NP (det(his) n(pyj-))) dot 0 S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(el-) PP (prep(in) NP (det(his) n(pyj-)))) dot 0 In fact there are two, as expected Also, look what happens if we take only the complete rules:
0 1 2 3 4 5 6 7 8 2 NP (det(the) n(man)) dot 0 3 VP (v(shot)) dot 2 3 S (NP (det(the) n(man)) VP (v(shot))) dot 0 5 NP (det(an) n(elephant)) dot 3 5 VP (v(shot) NP (det(an) n(elephant))) dot 2 5 S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(elephant))) dot 0 8 NP (det(his) n(pyjamas)) dot 6 8 PP (prep(in) NP (det(his) n(pyjamas))) dot 5 8 NP (det(an) n(elephant) PP (prep(in) NP (det(his) n(pyjamas)))) dot 3 8 VP (v(shot) NP (det(an) n(elephant)) PP (prep(in) NP (det(his) n(pyjamas))) dot 2 8 VP (v(shot) NP (det(an) n(elephant) PP (prep(in) NP (det(his) n(pyjamas)))) dot 2 8 S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(elephant)) PP (prep(in) NP (det(his) n(pyjamas))) dot 0 8 S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(elephant) PP (prep(in) NP (det(his) n(pyjamas)))) dot 0 S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(elephant)) PP (prep(in) NP (det(his) n(pyjamas))) S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(elephant) PP (prep(in) NP (det(his) n(pyjamas)))) VP (v(shot) NP (det(an) n(elephant)) PP (prep(in) NP (det(his) n(pyjamas))) VP (v(shot) NP (det(an) n(elephant) PP (prep(in) NP (det(his) n(pyjamas)))) S (NP (det(the) n(man)) VP (v(shot) NP (det(an) n(elephant))) NP (det(an) n(elephant) PP (prep(in) NP (det(his) n(pyjamas)))) S (NP (det(the) n(man)) VP (v(shot))) VP (v(shot) NP (det(an) n(elephant))) PP (prep(in) NP (det(his) n(pyjamas))) NP (det(an) n(elephant)) NP (det(the) n(man)) VP (v(shot)) NP (det(his) n(pyjamas))