470 likes | 576 Views
LING 438/538 Computational Linguistics. Sandiway Fong Lecture 24: 11/16. Administrivia. Current Reading Chapter 10: Parsing with Context-Free Grammars Lecture schedule Tuesday 21st November Homework #6: Context-free Grammars and Parsing due Tuesday 28th Thanksgiving break
E N D
LING 438/538Computational Linguistics Sandiway Fong Lecture 24: 11/16
Administrivia • Current Reading • Chapter 10: Parsing with Context-Free Grammars • Lecture schedule • Tuesday 21st November • Homework #6: Context-free Grammars and Parsing • due Tuesday 28th • Thanksgiving break • Tuesday 28th November • Thursday 30th November • Homework #7: Machine Translation • due December 7th • 538 Presentations • Tuesday 5th December • Homework #7: Machine Translation • 538 Presentations
Administrivia • 538 Presentations • details • select a chapter • prepare a powerpoint (or PDF) presentation • your goal is to present a clear explanation and summary of the assumptions, ideas and techniques used • you do not have to cover the entire chapter • you could choose to cover one or two sub-sections in depth • or survey the entire chapter • offer a critique of the methods • e.g. disadvantages • e.g. are there better methods out there now? (Google is your friend) • or present examples for which the method may not work well on • graded on how well you do this • you have 10 minutes for the presentation • answer questions
Administrivia • Your choice • (pick any chapter from 11 to 20, inclusive) • send me an email with your top 3 choices • first come, first served • per order of emails as received in my mailbox • due date • November 29th midnight • slides are due in my mailbox • (no advantage with respect to slides for the presentation date) • (pick your presentation date) • November 30th • December 5th
10 Chapters Available there are more than 10 of you taking 538 chapters may be split into two presentations 11: Features and Unification 12: Lexicalized and Probabilistic Parsing 13: Language and Complexity 14: Representing Meaning 15: Semantic Analysis 16: Lexical Semantics 17: Word Sense Disambiguation and Information Retrieval 18: Discourse 19: Dialogue and Conversational Agents 20: Natural Language Generation Administrivia
Administrivia • Homework 5 • due tonight (revised due date) • corpus.txt.zip • (first version I put up on the website was somehow truncated, • 2nd version is of the correct length, • data between <text> ... </text> • there is still some noise in the data: • e.g. & and tables • ignore them)
Today’s Topic • Chapter 10: • Parsing with Context-Free Grammars
Top-Down Parsing • we already know one top-down parsing algorithm • DCG rule system starting at the top node • using the Prolog computation rule • always try the first matching rule • expand x --> y, z. • top-down: x then y and z • left-to-right: do y then z • depth-first: expands DCG rules for y before tackling z • problems • left-recursion • gives termination problems • no bottom-up filtering • inefficient • left-corner idea
Top-Down Parsing • Prolog computation rule • is equivalent to the stack-based algorithm shown in the textbook • (figure 10.6) • Prolog advantage • we don’t need to implement this explicitly • this is default Prolog strategy
Top-Down Parsing • assume grammar
➡ Top-Down Parsing mismatch • example • does this flight include a meal?
Top-Down Parsing • example • does this flight include a meal?
Top-Down Parsing • Left Recursion • example • np(np(D,N)) --> det(D), nominal(N). • det(d(a)) --> [a]. • det(d(NP,'\'s')) --> np(NP), ['\'s']. • nominal(nom(N)) --> noun(N). • noun(n(man)) --> [man]. • query: NP a man • ?- np(X,[a,man],[]). • X = np(d(a),nom(n(man))) ? ;(goes into a loop) • Prolog interruption (h for help)? a • % Execution aborted • query: NP a man’s man • ?- np(X,[a,man,'\'s',man],[]). • X = np(d(np(d(a),nom(n(man))),'\'s'),nom(n(man))) ? ; • Prolog interruption (h for help)? a • % Execution aborted Sentential forms: np det nominal np ‘s nominal
Top-Down Parsing • Left Recursion • example • np(np(D,N)) --> det(D), nominal(N). • det(d(a)) --> [a]. • det(d(NP,'\'s')) --> np(NP), ['\'s']. % left recursive rule • nominal(nom(N)) --> noun(N). • noun(n(man)) --> [man]. • query (trace) • | ?- np(X,[a,man],[]). • 1 1 Call: np(_410,[a,man],[]) ? • 2 2 Call: det(_998,[a,man],_993) ? s • ? 2 2 Exit: det(d(a),[a,man],[man]) ? • 3 2 Call: nominal(_999,[man],[]) ? • 4 3 Call: noun(_2183,[man],[]) ? s • 4 3 Exit: noun(n(man),[man],[]) ? • 3 2 Exit: nominal(nom(n(man)),[man],[]) ? • ? 1 1 Exit: np(np(d(a),nom(n(man))),[a,man],[]) ? • X = np(d(a),nom(n(man))) ? ;
Top-Down Parsing • Left Recursion • example • np(np(D,N)) --> det(D), nominal(N). • det(d(a)) --> [a]. • det(d(NP,'\'s')) --> np(NP), ['\'s']. • nominal(nom(N)) --> noun(N). • noun(n(man)) --> [man]. • query (trace) • 1 1 Redo: np(np(d(a),nom(n(man))),[a,man],[]) ? • 2 2 Redo: det(d(a),[a,man],[man]) ? • 5 3 Call: np(_1417,[a,man],_1412) ? • 6 4 Call: det(_1818,[a,man],_1813) ? s • ? 6 4 Exit: det(d(a),[a,man],[man]) ? • 7 4 Call: nominal(_1819,[man],_1412) ? • 8 5 Call: noun(_3003,[man],_1412) ? s • 8 5 Exit: noun(n(man),[man],[]) ? • 7 4 Exit: nominal(nom(n(man)),[man],[]) ? • ? 5 3 Exit: np(np(d(a),nom(n(man))),[a,man],[]) ? • 9 3 Call: 'C'([],'\'s',_993) ? DCG rules translated into Prolog: det(d(a), A, B) :- 'C'(A, a, B). det(d(A,'\'s'), B, C) :- np(A, B, D), 'C'(D, '\'s', C). requires a following ‘s
Top-Down Parsing • strategy? one possibility • lookahead: • before committing to recursion • det(d(NP,'\'s')) --> np(NP), ['\'s']. • det(d(NP,'\'s')) --> checkinputaheadfor('\'s'), np(NP), ['\'s']. • Left Recursion • example • np(np(D,N)) --> det(D), nominal(N). • det(d(a)) --> [a]. • det(d(NP,'\'s')) --> np(NP), ['\'s']. • nominal(nom(N)) --> noun(N). • noun(n(man)) --> [man]. • query (trace) • 9 3 Call: 'C'([],'\'s',_993) ? • 9 3 Fail: 'C'([],'\'s',_993) ? • 5 3 Redo: np(np(d(a),nom(n(man))),[a,man],[]) ? • 6 4 Redo: det(d(a),[a,man],[man]) ? • 10 5 Call: np(_2237,[a,man],_2232) ? • 11 6 Call: det(_2638,[a,man],_2633) ? s • ? 11 6 Exit: det(d(a),[a,man],[man]) ? • 12 6 Call: nominal(_2639,[man],_2232) ? s • 12 6 Exit: nominal(nom(n(man)),[man],[]) ? • ? 10 5 Exit: np(np(d(a),nom(n(man))),[a,man],[]) ? • 13 5 Call: 'C'([],'\'s',_1813) ? now requires two following ‘s
Top-Down Parsing • back to textbook example • assume grammar
Top-Down Parsing • example • does this flight include a meal? • query • ?- s(X,[does,this,flight,include,a,meal],[]). • X = s(aux(does),np(det(this),nom(noun(flight))),vp(verb(include),np(det(a),nom(noun(meal)))))
Prolog grammar s(s(NP,VP)) --> np(NP), vp(VP). s(s(Aux,NP,VP)) --> aux(Aux), np(NP), vp(VP). s(s(VP)) --> vp(VP). np(np(D,N)) --> det(D), nominal(N). nominal(nom(N)) --> noun(N). nominal(nom(N1,N)) --> noun(N1), nominal(N). np(np(PN)) --> propernoun(PN). vp(vp(V)) --> verb(V). vp(vp(V,NP)) --> verb(V),np(NP). det(det(that)) --> [that]. det(det(this)) --> [this]. det(det(a)) --> [a]. noun(noun(book)) --> [book]. noun(noun(flight)) --> [flight]. noun(noun(meal)) --> [meal]. noun(noun(money)) --> [money]. verb(verb(book)) --> [book]. verb(verb(include)) --> [include]. verb(verb(prefer)) --> [prefer]. aux(aux(does)) --> [does]. preposition(prep(from)) --> [from]. preposition(prep(to)) --> [to]. preposition(prep(on)) --> [on]. propernoun(propn(houston)) --> [houston]. propernoun(propn(twa)) --> [twa]. nominal(nom(N,PP)) --> nominal(N), pp(PP). pp(pp(P,NP)) --> preposition(P), np(NP). Top-Down Parsing
Top-Down Parsing • example • does this flight include a meal? • gain in efficiency • avoid computing the first row of Figure 10.7
Top-Down Parsing • no bottom-up filtering • left-corner idea • eliminate unnecessary top-down search • reduce the number of choice points (amount of branching) • example • does this flight include a meal? • computation: • s --> np, vp. • s --> aux, np, vp. • s --> vp. • left-corner idea rules out 1 and 3
Left Corner Parsing • need bottom-up filtering • filter top-down rule expansion using bottom-up information • current input is the bottom-up information • left-corner idea • example • s(s(NP,VP)) --> np(NP), vp(VP). • what terminals can be used to begin this phrase? • answer: whatever can begin NP • np(np(D,N)) --> det(D), nominal(N). • np(np(PN)) --> propernoun(PN). • answer: whatever can begin Det or ProperNoun • det(det(that)) --> [that]. • det(det(this)) --> [this]. • det(det(a)) --> [a]. • propernoun(propn(houston)) --> [houston]. • propernoun(propn(twa)) --> [twa]. • answer: • {that,this,a,houston,twa} “Left Corner” s /\ np vp /\ det nominal propernoun
Left Corner Parsing • example • does this flight include a meal? • computation • s(s(NP,VP)) --> np(NP), vp(VP). LC: {that,this,a,houston,twa} • s(s(Aux,NP,VP)) --> aux(Aux), np(NP), vp(VP). LC:{does} • s(s(VP)) --> vp(VP). LC:{book,include,prefer} • only rule 2 is compatible with the input • match first input terminal against left-corner (LC) set for each possible matching rule • left-corner idea prunes away or rules out options 1 and 3
Left Corner Parsing • DCG Rules • s(s(NP,VP)) --> np(NP), vp(VP). LC: {that,this,a,houston,twa} • s(s(Aux,NP,VP)) --> aux(Aux), np(NP), vp(VP). LC: {does} • s(s(VP)) --> vp(VP). LC: {book,include,prefer} • left-corner database facts • % lc(rule#,[word|_],[word|_]). • lc(1,[that|L],[that|L]). lc(2,[does|L],[does|L]). • lc(1,this|L],[this|L]). lc(3,[book|L],[book|L]). • lc(1,[a|L],[a|L]). lc(3,[include|L],[include|L]). • lc(1,[houston|L],[houston|L]). lc(3,[prefer|L],[prefer|L]). • lc(1,[twa|L],[twa|L]). • rewrite Prolog rules to check input against lc • s(s(NP,VP)) --> lc(1), np(NP), vp(VP). • s(s(Aux,NP,VP)) --> lc(2), aux(Aux), np(NP), vp(VP). • s(s(VP)) --> lc(3), vp(VP).
Left Corner Parsing • left-corner database facts • % lc(rule#,[word|_],[word|_]). • lc(1,[that|L],[that|L]). lc(2,[does|L],[does|L]). • lc(1,this|L],[this|L]). lc(3,[book|L],[book|L]). • lc(1,[a|L],[a|L]). lc(3,[include|L],[include|L]). • lc(1,[houston|L],[houston|L]). lc(3,[prefer|L],[prefer|L]). • lc(1,[twa|L],[twa|L]). • rewrite DCG rules to check input against lc/3 • s(s(NP,VP)) --> lc(1), np(NP), vp(VP). • s(s(Aux,NP,VP)) --> lc(2), aux(Aux), np(NP), vp(VP). • s(s(VP)) --> lc(3), vp(VP). • DCG rules are translated into underlying Prolog rules: • s(s(A,B), C, D) :- lc(1, C, E), np(A, E, F), vp(B, F, D). • s(s(A,B,C), D, E) :- lc(2, D, F), aux(A, F, G), np(B, G, H), vp(C, H, E). • s(s(A), B, C) :- lc(3, B, D), vp(A, D, C).
Left Corner Parsing • Summary: • Given a context-free DCG • Generate left-corner database facts • lc(rule#,[word|_],[word|_]). • Rewrite DCG rules to check input against lc • s(s(NP,VP)) --> lc(1), np(NP), vp(VP). • DCG rules are translated into underlying Prolog rules: • s(s(A,B), C, D) :- lc(1, C, E), np(A, E, F), vp(B, F, D). • This process can be done automatically (by program) • Note: • not all rules need be rewritten • lexicon rules are direct left-corner rules • no filtering is necessary • det(det(a)) --> [a]. • noun(noun(book)) --> [book]. • i.e. no need to call lc as in • det(det(a)) --> lc(11), [a]. • noun(noun(book)) --> lc(12), [book].
s(s(_549,_550))-->lc(1),np(_549),vp(_550). s(s(_554,_555,_556))-->lc(2),aux(_554),np(_555),vp(_556). s(s(_544))-->lc(3),vp(_544). np(np(_549,_550))-->lc(4),det(_549),nominal(_550). nominal(nom(_544))-->lc(5),noun(_544). nominal(nom(_549,_550))-->lc(6),noun(_549),nominal(_550). np(np(_544))-->lc(7),propernoun(_544). vp(vp(_544))-->lc(8),verb(_544). vp(vp(_549,_550))-->lc(9),verb(_549),np(_550). nominal(nom(_549,_550))-->lc(5),nominal(_549),pp(_550). pp(pp(_549,_550))-->lc(27),preposition(_549),np(_550). det(det(that)) --> [that]. det(det(this)) --> [this]. det(det(a)) --> [a]. noun(noun(book)) --> [book]. noun(noun(flight)) --> [flight]. noun(noun(meal)) --> [meal]. noun(noun(money)) --> [money]. verb(verb(book)) --> [book]. verb(verb(include)) --> [include]. verb(verb(prefer)) --> [prefer]. aux(aux(does)) --> [does]. preposition(prep(from)) --> [from]. preposition(prep(to)) --> [to]. preposition(prep(on)) --> [on]. propernoun(propn(houston)) --> [houston]. propernoun(propn(twa)) --> [twa]. lc(1, [that|A], [that|A]). lc(1, [this|A], [this|A]). lc(1, [a|A], [a|A]). lc(1, [houston|A], [houston|A]) .lc(1, [twa|A], [twa|A]). lc(2, [does|A], [does|A]). lc(3, [book|A], [book|A]). lc(3, [include|A], [include|A]). lc(3, [prefer|A], [prefer|A]). lc(3, [book|A], [book|A]). lc(3, [include|A], [include|A]). lc(3, [prefer|A], [prefer|A]). lc(4, [that|A], [that|A]). lc(4, [this|A], [this|A]). lc(4, [a|A], [a|A]). lc(5, [book|A], [book|A]). lc(5, [flight|A], [flight|A]). lc(5, [meal|A], [meal|A]). lc(5, [money|A], [money|A]). lc(6, [book|A], [book|A]). lc(6, [flight|A], [flight|A]). lc(6, [meal|A], [meal|A]). lc(6, [money|A], [money|A]). lc(7, [houston|A], [houston|A]). lc(7, [twa|A], [twa|A]). lc(8, [book|A], [book|A]). lc(8, [include|A], [include|A]). lc(8, [prefer|A], [prefer|A]). lc(9, [book|A], [book|A]). lc(9, [include|A], [include|A]). lc(9, [prefer|A], [prefer|A]). lc(27, [from|A], [from|A]). lc(27, [to|A], [to|A]). lc(27, [on|A], [on|A]). Left Corner Parsing
Left Corner Parsing • Prolog query: • ?- s(X,[does,this,flight,include,a,meal],[]). • 1 1 Call: s(_430,[does,this,flight,include,a,meal],[]) ? • 2 2 Call: lc(1,[does,this,flight,include,a,meal],_1100) ? • 2 2 Fail: lc(1,[does,this,flight,include,a,meal],_1100) ? • 3 2 Call: lc(2,[does,this,flight,include,a,meal],_1107) ? • 3 2 Exit: lc(2,[does,this,flight,include,a,meal],[does,this,flight,include,a,meal]) ? • 4 2 Call: aux(_1112,[does,this,flight,include,a,meal],_1100) ? • 5 3 Call: 'C'([does,this,flight,include,a,meal],does,_1100) ? s • 5 3 Exit: 'C'([does,this,flight,include,a,meal],does,[this,flight,include,a,meal]) ? • 4 2 Exit: aux(aux(does),[does,this,flight,include,a,meal],[this,flight,include,a,meal]) ? • 6 2 Call: np(_1113,[this,flight,include,a,meal],_1093) ? • 7 3 Call: lc(4,[this,flight,include,a,meal],_3790) ? • ? 7 3 Exit: lc(4,[this,flight,include,a,meal],[this,flight,include,a,meal]) ? • 8 3 Call: det(_3795,[this,flight,include,a,meal],_3783) ? s • ? 8 3 Exit: det(det(this),[this,flight,include,a,meal],[flight,include,a,meal]) ? • 9 3 Call: nominal(_3796,[flight,include,a,meal],_1093) ? • 10 4 Call: lc(5,[flight,include,a,meal],_5740) ? s • ? 10 4 Exit: lc(5,[flight,include,a,meal],[flight,include,a,meal]) ? • 11 4 Call: noun(_5745,[flight,include,a,meal],_1093) ? s • ? 11 4 Exit: noun(noun(flight),[flight,include,a,meal],[include,a,meal]) ? • ? 9 3 Exit: nominal(nom(noun(flight)),[flight,include,a,meal],[include,a,meal]) ? • ? 6 2 Exit: np(np(det(this),nom(noun(flight))),[this,flight,include,a,meal],[include,a,meal]) ?
Left Corner Parsing • Prolog query (contd.): • 12 2 Call: vp(_1114,[include,a,meal],[]) ? • 13 3 Call: lc(8,[include,a,meal],_8441) ? s • ? 13 3 Exit: lc(8,[include,a,meal],[include,a,meal]) ? • 14 3 Call: verb(_8446,[include,a,meal],[]) ? s • 14 3 Fail: verb(_8446,[include,a,meal],[]) ? • 13 3 Redo: lc(8,[include,a,meal],[include,a,meal]) ? s • 13 3 Fail: lc(8,[include,a,meal],_8441) ? • 15 3 Call: lc(9,[include,a,meal],_8448) ? s • ? 15 3 Exit: lc(9,[include,a,meal],[include,a,meal]) ? • 16 3 Call: verb(_8453,[include,a,meal],_8441) ? s • ? 16 3 Exit: verb(verb(include),[include,a,meal],[a,meal]) ? • 17 3 Call: np(_8454,[a,meal],[]) ? • 18 4 Call: lc(4,[a,meal],_10423) ? s • 18 4 Exit: lc(4,[a,meal],[a,meal]) ? • 19 4 Call: det(_10428,[a,meal],_10416) ? s • 19 4 Exit: det(det(a),[a,meal],[meal]) ? • 20 4 Call: nominal(_10429,[meal],[]) ? • 21 5 Call: lc(5,[meal],_12385) ? s • ? 21 5 Exit: lc(5,[meal],[meal]) ? • 22 5 Call: noun(_12390,[meal],[]) ? s • ? 22 5 Exit: noun(noun(meal),[meal],[]) ? • ? 20 4 Exit: nominal(nom(noun(meal)),[meal],[]) ? • ? 17 3 Exit: np(np(det(a),nom(noun(meal))),[a,meal],[]) ? • ? 12 2 Exit: vp(vp(verb(include),np(det(a),nom(noun(meal)))),[include,a,meal],[]) ? • ? 1 1 Exit: s(s(aux(does),np(det(this),nom(noun(flight))),vp(verb(include),np(det(a),nom(noun(meal))))),[does,this,flight,include,a,meal],[]) ? • X = s(aux(does),np(det(this),nom(noun(flight))),vp(verb(include),np(det(a),nom(noun(meal))))) ?
Bottom-Up Parsing • LR(0) parsing • An example of bottom-up tabular parsing • Similar to the top-down Earley algorithm described in the textbook in that it uses the idea of dotted rules
Tabular Parsing • e.g. LR(k) (Knuth, 1960) • invented for efficient parsing of programming languages • disadvantage: a potentially huge number of states can be generated when the number of rules in the grammar is large • can be applied to natural languages (Tomita 1985) • tables encode the grammar • grammar rules are compiled • no longer interpret the grammar rules directly • Parser = Table + Push-down Stack • table entries contain instruction(s) that tell what to do at a given state … possibly factoring in lookahead • stack data structure deals with maintaining the history of computation and recursion
Tabular Parsing • Shift-Reduce Parsing • example • LR(0) • left to right • bottom-up • (0) no lookahead (input word) • LR actions • Shift: read an input word • i.e. advance current input word pointer to the next word • Reduce: complete a nonterminal • i.e. complete parsing a grammar rule • Accept: complete the parse • i.e. start symbol (e.g. S) derives the terminal string
Tabular Parsing • LR(0) Parsing • L(G) = LR(0) • i.e. the language generated by grammar G is LR(0) if there is a unique instruction per state (or no instruction = error state) LR(0) is a proper subset of context-free languages • note • human language tends to be ambiguous • there are likely to be multiple or conflicting actions per state • can let Prolog’s computation rule handle it • i.e. use Prolog backtracking
Tabular Parsing • Dotted Rule Notation • “dot” used to indicate the progress of a parse through a phrase structure rule • examples • vp --> v . np means we’ve seen v and predict np • np --> . d np means we’re predicting a d (followed by np) • vp --> vp pp. means we’ve completed a vp
Tabular Parsing • state • a set of dotted rules encodes the state of the parse • kernel • vp --> v . np • vp --> v . • completion (of predict) • np --> . d n • np --> . n • np --> . np cp
Tabular Parsing • advance the dot • example: (d is next in the input) • vp --> v . np • vp --> v . (eliminated) • np --> d . n • np --> . n (eliminated) • np --> . np cp
Tabular Parsing • Dotted rules • example • State 0: • s -> . np vp • np -> .d np • np -> .n • np -> .np pp • possible actions • shiftd and go to new state • shiftn and go to new state
Tabular Parsing • State 0: Shift D or N State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2
State 3 Tabular Parsing NP -> D N . • State 1: Shift N State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2
Tabular Parsing • Shift • take input word, and • place on stack [D a] [N man] [V hit ] … Input Stack
State 3 NP -> D N . State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2 Tabular Parsing shift n shift d • Shift • take input word, and • place on stack [V hit ] … [N man] [D a ] Input • state 3 Stack
State 3 Tabular Parsing NP -> D N . • State 2: Reduce NP -> N State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2
Tabular Parsing • Reduce NP -> N . • pop [N milk] off the stack, and • replace with [NP [N milk]] on stack [V is ] … [N milk] Input • State 2 Stack
Tabular Parsing • Reduce NP -> N . • pop [N milk] off the stack, and • replace with [NP [N milk]] on stack [V is ] … [NP milk] Input • State 2 Stack
State 3 NP -> D N . State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2 Tabular Parsing • State 3: Reduce NP -> D N .
Tabular Parsing • Reduce NP -> D N . • pop [N man] and [D a] off the stack • replace with [NP[D a][N man]] [V hit ] … [N man] [D a ] Input • State 3 Stack
Tabular Parsing • Reduce NP -> D N . • pop [N man] and [D a] off the stack • replace with [NP[D a][N man]] [V hit ] … [NP[D a ][N man]] Input • State 3 Stack