1 / 46

LING/C SC/PSYC 438/538

LING/C SC/PSYC 438/538. Lecture 28 Sandiway Fong. Administrivia. Reminders Homework 5 due next Monday 538 Presentations Presentations: Next Wednesday. 538 Presentations. Parsing Methods. Algorithms :

halle
Download Presentation

LING/C SC/PSYC 438/538

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING/C SC/PSYC 438/538 Lecture 28 Sandiway Fong

  2. Administrivia • Reminders • Homework 5 • due next Monday • 538 Presentations • Presentations: Next Wednesday

  3. 538 Presentations

  4. Parsing Methods • Algorithms: • in your homework, you’ve used Prolog default computation rule to parse context-free grammars • This strategy is known as a top-down, depth-first search strategy • There are many other methods …

  5. Top-Down Parsing • we already know one top-down parsing algorithm • DCG rule system starting at the top node • using the Prolog computation rule • always try the first matching rule • expand x --> y, z. • top-down: x then y and z • left-to-right: do y then z • depth-first: expands DCG rules for y before tackling z • problems • left-recursion • gives termination problems • no bottom-up filtering • inefficient • left-corner idea

  6. Top-Down Parsing • Prolog computation rule • is equivalent to the stack-based algorithm shown in the textbook • (section 13.1.1 of 2nd edition) • Prolog advantage • we don’t need to implement this explicitly • this is default Prolog strategy

  7. Top-Down Parsing • assume grammar

  8. Top-Down Parsing • example • does this flight include a meal? mismatch

  9. Top-Down Parsing • example • does this flight include a meal?

  10. Top-Down Parsing • example • does this flight include a meal? • query ?-s(X,[does,this,flight,include,a,meal],[]). X = s(aux(does),np(det(this),nom(noun(flight))),vp(verb(include),np(det(a),nom(noun(meal)))))

  11. Prolog grammar s(s(NP,VP)) --> np(NP), vp(VP). s(s(Aux,NP,VP)) --> aux(Aux), np(NP), vp(VP). s(s(VP)) --> vp(VP). np(np(D,N)) --> det(D), nominal(N). nominal(nom(N)) --> noun(N). nominal(nom(N1,N)) --> noun(N1), nominal(N). np(np(PN)) --> propernoun(PN). vp(vp(V)) --> verb(V). vp(vp(V,NP)) --> verb(V),np(NP). det(det(that)) --> [that]. det(det(this)) --> [this]. det(det(a)) --> [a]. noun(noun(book)) --> [book]. noun(noun(flight)) --> [flight]. noun(noun(meal)) --> [meal]. noun(noun(money)) --> [money]. verb(verb(book)) --> [book]. verb(verb(include)) --> [include]. verb(verb(prefer)) --> [prefer]. aux(aux(does)) --> [does]. preposition(prep(from)) --> [from]. preposition(prep(to)) --> [to]. preposition(prep(on)) --> [on]. propernoun(propn(houston)) --> [houston]. propernoun(propn(twa)) --> [twa]. nominal(nom(N,PP)) --> nominal(N), pp(PP). pp(pp(P,NP)) --> preposition(P), np(NP). Top-Down Parsing

  12. Top-Down Parsing • example • does this flight include a meal? • gain in efficiency • avoid computing the first row of Figure 10.7

  13. Top-Down Parsing • no bottom-up filtering • left-corner idea • eliminate unnecessary top-down search • reduce the number of choice points (amount of branching) • example • does this flight include a meal? • computation: • s --> np, vp. • s --> aux, np, vp. • s --> vp. • left-corner idea rules out 1 and 3

  14. Left Corner Parsing • need bottom-up filtering • filter top-down rule expansion using bottom-up information • current input is the bottom-up information • left-corner idea • example • s(s(NP,VP)) --> np(NP), vp(VP). • what terminals can be used to begin this phrase? • answer: whatever can begin NP • np(np(D,N)) --> det(D), nominal(N). • np(np(PN)) --> propernoun(PN). • answer: whatever can begin Det or ProperNoun • det(det(that)) --> [that]. • det(det(this)) --> [this]. • det(det(a)) --> [a]. • propernoun(propn(houston)) --> [houston]. • propernoun(propn(twa)) --> [twa]. • answer: • {that,this,a,houston,twa} “Left Corner” s /\ np vp /\ det nominal propernoun

  15. Left Corner Parsing • example • does this flight include a meal? • computation • s(s(NP,VP)) --> np(NP), vp(VP). LC: {that,this,a,houston,twa} • s(s(Aux,NP,VP)) --> aux(Aux), np(NP), vp(VP). LC:{does} • s(s(VP)) --> vp(VP). LC:{book,include,prefer} • only rule 2 is compatible with the input • match first input terminal against left-corner (LC) set for each possible matching rule • left-corner idea prunes away or rules out options 1 and 3

  16. Left Corner Parsing • DCG Rules • s(s(NP,VP)) --> np(NP), vp(VP). LC: {that,this,a,houston,twa} • s(s(Aux,NP,VP)) --> aux(Aux), np(NP), vp(VP). LC: {does} • s(s(VP)) --> vp(VP). LC: {book,include,prefer} • left-corner database facts • % lc(rule#,[word|_],[word|_]). • lc(1,[that|L],[that|L]). lc(2,[does|L],[does|L]). • lc(1,this|L],[this|L]). lc(3,[book|L],[book|L]). • lc(1,[a|L],[a|L]). lc(3,[include|L],[include|L]). • lc(1,[houston|L],[houston|L]). lc(3,[prefer|L],[prefer|L]). • lc(1,[twa|L],[twa|L]). • rewrite Prolog rules to check input against lc • s(s(NP,VP)) --> lc(1), np(NP), vp(VP). • s(s(Aux,NP,VP)) --> lc(2), aux(Aux), np(NP), vp(VP). • s(s(VP)) --> lc(3), vp(VP).

  17. Left Corner Parsing • left-corner database facts • % lc(rule#,[word|_],[word|_]). • lc(1,[that|L],[that|L]). lc(2,[does|L],[does|L]). • lc(1,this|L],[this|L]). lc(3,[book|L],[book|L]). • lc(1,[a|L],[a|L]). lc(3,[include|L],[include|L]). • lc(1,[houston|L],[houston|L]). lc(3,[prefer|L],[prefer|L]). • lc(1,[twa|L],[twa|L]). • rewrite DCG rules to check input against lc/3 • s(s(NP,VP)) --> lc(1), np(NP), vp(VP). • s(s(Aux,NP,VP)) --> lc(2), aux(Aux), np(NP), vp(VP). • s(s(VP)) --> lc(3), vp(VP). • DCG rules are translated into underlying Prolog rules: • s(s(A,B), C, D) :- lc(1, C, E), np(A, E, F), vp(B, F, D). • s(s(A,B,C), D, E) :- lc(2, D, F), aux(A, F, G), np(B, G, H), vp(C, H, E). • s(s(A), B, C) :- lc(3, B, D), vp(A, D, C).

  18. Left Corner Parsing • Summary: • Given a context-free DCG • Generate left-corner database facts • lc(rule#,[word|_],[word|_]). • Rewrite DCG rules to check input against lc • s(s(NP,VP)) --> lc(1), np(NP), vp(VP). • DCG rules are translated into underlying Prolog rules: • s(s(A,B), C, D) :- lc(1, C, E), np(A, E, F), vp(B, F, D). • This process can be done automatically (by program) • Note: • not all rules need be rewritten • lexicon rules are direct left-corner rules • no filtering is necessary • det(det(a)) --> [a]. • noun(noun(book)) --> [book]. • i.e. no need to call lc as in • det(det(a)) --> lc(11), [a]. • noun(noun(book)) --> lc(12), [book].

  19. s(s(_549,_550))-->lc(1),np(_549),vp(_550). s(s(_554,_555,_556))-->lc(2),aux(_554),np(_555),vp(_556). s(s(_544))-->lc(3),vp(_544). np(np(_549,_550))-->lc(4),det(_549),nominal(_550). nominal(nom(_544))-->lc(5),noun(_544). nominal(nom(_549,_550))-->lc(6),noun(_549),nominal(_550). np(np(_544))-->lc(7),propernoun(_544). vp(vp(_544))-->lc(8),verb(_544). vp(vp(_549,_550))-->lc(9),verb(_549),np(_550). nominal(nom(_549,_550))-->lc(5),nominal(_549),pp(_550). pp(pp(_549,_550))-->lc(27),preposition(_549),np(_550). det(det(that)) --> [that]. det(det(this)) --> [this]. det(det(a)) --> [a]. noun(noun(book)) --> [book]. noun(noun(flight)) --> [flight]. noun(noun(meal)) --> [meal]. noun(noun(money)) --> [money]. verb(verb(book)) --> [book]. verb(verb(include)) --> [include]. verb(verb(prefer)) --> [prefer]. aux(aux(does)) --> [does]. preposition(prep(from)) --> [from]. preposition(prep(to)) --> [to]. preposition(prep(on)) --> [on]. propernoun(propn(houston)) --> [houston]. propernoun(propn(twa)) --> [twa]. lc(1, [that|A], [that|A]). lc(1, [this|A], [this|A]). lc(1, [a|A], [a|A]). lc(1, [houston|A], [houston|A]) .lc(1, [twa|A], [twa|A]). lc(2, [does|A], [does|A]). lc(3, [book|A], [book|A]). lc(3, [include|A], [include|A]). lc(3, [prefer|A], [prefer|A]). lc(3, [book|A], [book|A]). lc(3, [include|A], [include|A]). lc(3, [prefer|A], [prefer|A]). lc(4, [that|A], [that|A]). lc(4, [this|A], [this|A]). lc(4, [a|A], [a|A]). lc(5, [book|A], [book|A]). lc(5, [flight|A], [flight|A]). lc(5, [meal|A], [meal|A]). lc(5, [money|A], [money|A]). lc(6, [book|A], [book|A]). lc(6, [flight|A], [flight|A]). lc(6, [meal|A], [meal|A]). lc(6, [money|A], [money|A]). lc(7, [houston|A], [houston|A]). lc(7, [twa|A], [twa|A]). lc(8, [book|A], [book|A]). lc(8, [include|A], [include|A]). lc(8, [prefer|A], [prefer|A]). lc(9, [book|A], [book|A]). lc(9, [include|A], [include|A]). lc(9, [prefer|A], [prefer|A]). lc(27, [from|A], [from|A]). lc(27, [to|A], [to|A]). lc(27, [on|A], [on|A]). Left Corner Parsing

  20. Left Corner Parsing • Prolog query: • ?- s(X,[does,this,flight,include,a,meal],[]). • 1 1 Call: s(_430,[does,this,flight,include,a,meal],[]) ? • 2 2 Call: lc(1,[does,this,flight,include,a,meal],_1100) ? • 2 2 Fail: lc(1,[does,this,flight,include,a,meal],_1100) ? • 3 2 Call: lc(2,[does,this,flight,include,a,meal],_1107) ? • 3 2 Exit: lc(2,[does,this,flight,include,a,meal],[does,this,flight,include,a,meal]) ? • 4 2 Call: aux(_1112,[does,this,flight,include,a,meal],_1100) ? • 5 3 Call: 'C'([does,this,flight,include,a,meal],does,_1100) ? s • 5 3 Exit: 'C'([does,this,flight,include,a,meal],does,[this,flight,include,a,meal]) ? • 4 2 Exit: aux(aux(does),[does,this,flight,include,a,meal],[this,flight,include,a,meal]) ? • 6 2 Call: np(_1113,[this,flight,include,a,meal],_1093) ? • 7 3 Call: lc(4,[this,flight,include,a,meal],_3790) ? • ? 7 3 Exit: lc(4,[this,flight,include,a,meal],[this,flight,include,a,meal]) ? • 8 3 Call: det(_3795,[this,flight,include,a,meal],_3783) ? s • ? 8 3 Exit: det(det(this),[this,flight,include,a,meal],[flight,include,a,meal]) ? • 9 3 Call: nominal(_3796,[flight,include,a,meal],_1093) ? • 10 4 Call: lc(5,[flight,include,a,meal],_5740) ? s • ? 10 4 Exit: lc(5,[flight,include,a,meal],[flight,include,a,meal]) ? • 11 4 Call: noun(_5745,[flight,include,a,meal],_1093) ? s • ? 11 4 Exit: noun(noun(flight),[flight,include,a,meal],[include,a,meal]) ? • ? 9 3 Exit: nominal(nom(noun(flight)),[flight,include,a,meal],[include,a,meal]) ? • ? 6 2 Exit: np(np(det(this),nom(noun(flight))),[this,flight,include,a,meal],[include,a,meal]) ?

  21. Left Corner Parsing • Prolog query (contd.): • 12 2 Call: vp(_1114,[include,a,meal],[]) ? • 13 3 Call: lc(8,[include,a,meal],_8441) ? s • ? 13 3 Exit: lc(8,[include,a,meal],[include,a,meal]) ? • 14 3 Call: verb(_8446,[include,a,meal],[]) ? s • 14 3 Fail: verb(_8446,[include,a,meal],[]) ? • 13 3 Redo: lc(8,[include,a,meal],[include,a,meal]) ? s • 13 3 Fail: lc(8,[include,a,meal],_8441) ? • 15 3 Call: lc(9,[include,a,meal],_8448) ? s • ? 15 3 Exit: lc(9,[include,a,meal],[include,a,meal]) ? • 16 3 Call: verb(_8453,[include,a,meal],_8441) ? s • ? 16 3 Exit: verb(verb(include),[include,a,meal],[a,meal]) ? • 17 3 Call: np(_8454,[a,meal],[]) ? • 18 4 Call: lc(4,[a,meal],_10423) ? s • 18 4 Exit: lc(4,[a,meal],[a,meal]) ? • 19 4 Call: det(_10428,[a,meal],_10416) ? s • 19 4 Exit: det(det(a),[a,meal],[meal]) ? • 20 4 Call: nominal(_10429,[meal],[]) ? • 21 5 Call: lc(5,[meal],_12385) ? s • ? 21 5 Exit: lc(5,[meal],[meal]) ? • 22 5 Call: noun(_12390,[meal],[]) ? s • ? 22 5 Exit: noun(noun(meal),[meal],[]) ? • ? 20 4 Exit: nominal(nom(noun(meal)),[meal],[]) ? • ? 17 3 Exit: np(np(det(a),nom(noun(meal))),[a,meal],[]) ? • ? 12 2 Exit: vp(vp(verb(include),np(det(a),nom(noun(meal)))),[include,a,meal],[]) ? • ? 1 1 Exit: s(s(aux(does),np(det(this),nom(noun(flight))),vp(verb(include),np(det(a),nom(noun(meal))))),[does,this,flight,include,a,meal],[]) ? • X = s(aux(does),np(det(this),nom(noun(flight))),vp(verb(include),np(det(a),nom(noun(meal))))) ?

  22. Bottom-Up Parsing • LR(0) parsing • An example of bottom-up tabular parsing • Similar to the top-downEarley algorithm described in the textbook in that it uses the idea of dotted rules

  23. Tabular Parsing • e.g. LR(k) (Knuth, 1960) • invented for efficient parsing of programming languages • disadvantage: a potentially huge number of states can be generated when the number of rules in the grammar is large • can be applied to natural languages (Tomita 1985) • build a Finite State Automaton (FSA) from the grammar rules, then add a stack • tables encode the grammar (FSA) • grammar rules are compiled • no longer interpret the grammar rules directly • Parser = Table + Push-down Stack • table entries contain instruction(s) that tell what to do at a given state … possibly factoring in lookahead • stack data structure deals with maintaining the history of computation and recursion

  24. Tabular Parsing • Shift-Reduce Parsing • example • LR(0) • left to right • bottom-up • (0) no lookahead (input word) • LR actions • Shift: read an input word • i.e. advance current input word pointer to the next word • Reduce: complete a nonterminal • i.e. complete parsing a grammar rule • Accept: complete the parse • i.e. start symbol (e.g. S) derives the terminal string

  25. Tabular Parsing • LR(0) Parsing • L(G) = LR(0) • i.e. the language generated by grammar G is LR(0) if there is a unique instruction per state (or no instruction = error state) LR(0) is a proper subset of context-free languages • note • human language tends to be ambiguous • there are likely to be multiple or conflicting actions per state • can let Prolog’s computation rule handle it • i.e. use Prolog backtracking

  26. Dotted Rule Notation “dot” used to indicate the progress of a parse through a phrase structure rule examples vp --> v . np means we’ve seen v and predict np np --> . d np means we’re predicting a d (followed by np) vp --> vp pp. means we’ve completed a vp state a set of dotted rules encodes the state of the parse kernel vp --> v . np vp --> v . completion (of predict NP) np --> . d n np --> . n np --> . np cp Tabular Parsing

  27. Tabular Parsing • compute possible states by advancing the dot • example: • (Assume d is next in the input) • vp --> v . np • vp --> v . (eliminated) • np --> d . n • np --> . n (eliminated) • np --> . np cp

  28. Dotted rules example State 0: s -> . np vp np -> .d np np -> .n np -> .np pp possible actions shiftd and go to new state shiftn and go to new state Creating new states State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2 Tabular Parsing shift d shift n

  29. State 3 NP -> D N . State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2 Tabular Parsing • State 1: Shift N, goto State 2

  30. State 3 NP -> D N . State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP [V hit ] … [N man] [D a ] NP -> N . Input • state 3 Stack State 0 State 2 Tabular Parsing shift n shift d • Shift • take input word, and • place on stack

  31. State 3 NP -> D N . State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2 Tabular Parsing • State 2: Reduce action NP -> N .

  32. [NP milk] Tabular Parsing • Reduce NP -> N . • pop [N milk] off the stack, and • replace with [NP [N milk]] on stack [V is ] … [N milk] Input • State 2 Stack

  33. State 3 NP -> D N . State 1 NP -> D . N S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP NP -> N . State 0 State 2 Tabular Parsing • State 3: Reduce NP -> D N .

  34. [NP[D a ][N man]] Tabular Parsing • Reduce NP -> D N . • pop [N man] and [D a] off the stack • replace with [NP[D a][N man]] [V hit ] … [N man] [D a ] Input • State 3 Stack

  35. State 2 NP -> N . State 4 S -> . NP VP NP -> . D N NP -> . N NP -> . NP PP S -> NP . VP NP -> NP . PP VP -> . V NP VP -> . V VP -> . VP PP PP -> . P NP State 0 Tabular Parsing • State 0: Transition NP

  36. Tabular Parsing • for both states 2 and 3 • NP -> N . (reduce NP -> N) • NP -> D N . (reduce NP -> D N) • after Reduce NP operation • Goto state 4 • notes: • states are unique • grammar is finite • procedure generating states must terminate since the number of possible dotted rules

  37. Tabular Parsing

  38. Tabular Parsing • Observations • table is sparse • example • State 0, Input: [V ..] • parse fails immediately • in a given state, input may be irrelevant • example • State 2 (there is no shift operation) • there may be action conflicts • example • State 1: shift D, shift N • more interesting cases • shift-reduce and reduce-reduce conflicts

  39. Tabular Parsing • finishing up • an extra initial rule is usually added to the grammar • SS --> S . $ • SS = start symbol • $ = end of sentence marker • input: • milk is good for you $ • accept action • discard $ from input • return element at the top of stack as the parse tree

  40. LR Parsing in Prolog • Recap • finite state machine • each state represents a set of dotted rules • example • S --> . NP VP • NP --> . D N • NP --> . N • NP --> . NP PP • we transition, i.e. move, from state to state by advancing the “dot” over terminal and nonterminal symbols

  41. Build Actions • two main actions • Shift • move a word from the input onto the stack • Example: • NP --> .D N • Reduce • build a new constituent • Example: • NP --> D N.

  42. Parser • Example: • ?- parse([john,saw,the,man,with,a,telescope],X). • X = s(np(n(john)),vp(v(saw),np(np(d(the),n(man)),pp(p(with),np(d(a),n(telescope)))))) ; • X = s(np(n(john)),vp(vp(v(saw),np(d(the),n(man))),pp(p(with),np(d(a),n(telescope))))) ; • no

  43. LR(0) Goto Table

  44. LR(0) Action Table S = shift, R = reduce, A = accept Empty cells = error states Multiple actions = machine conflict Prolog’s computation rule: backtrack

  45. LR(0) Conflict Statistics • Toy grammar • 14 states • 6 states • with 2 competing actions • states 11,10,8: • shift-reduce conflict • 1 state • with 3 competing actions • State 7: • shift(d) shift(n) reduce(vp->v)

  46. LR Parsing • in fact • LR-parsers are generally acknowledged to be the fastest parsers • using lookahead (current terminal symbol) • and when combined with the chart technique (memorizing subphrases in a table - dynamic programming) • textbook • Earley’salgorithm (13.4.2) • uses chart • but builds dotted-rule configurations dynamicallyat parse-time • instead of ahead of time (so slower than LR)

More Related